My private opinion about Ruby parser

This article is an English translation of an article written on November 12, 2021. If you find that there are unnatural expressions, I would appreciate it if you could comment on this article.

Summary

This is a summary of "A maintainable, flexible, and usable Ruby parser"*1, one of the topics of the "Ruby Committers vs the World" content at RubyKaigi Takeout 2021.

I have hit upon a truly marvelous Ruby parser, which this margin is too narrow to contain.

A maintainable, flexible, and usable Ruby parser

Objective
- Make the parser easy to maintain
- Make the parser useful for LSP (VS Code/IDE)
Problems
- The current parse.y is a hell*2
- LSP often requires more detailed syntax tree (concrete syntax tree?)
  - Need the location of comments, punctuations, etc.
  - The current AST node keeps only a range (the beginning and the end)
- LSP requires an error-tolerant parse
  - IDE needs to handle incomplete source code
Ideas
- Rewrite it with PEG (like Python)?
- Use the parser gem? Use Ripper?

Incidentally, these items are not organized items, as mame mentioned in the video. It's just that he wrote up "something mame is (or we are) frustrated with" on the slides, it's not that he wants to do what's written on the slides (as mame said so). In other words, he didn't write anything on the slide that we could actually implement.

What is that "something mame is (or we are) frustrated with"?

What we would like to do
- We would like to make the parser maintainable by anyone.
- We would like to make the parser flexible and convenient to use LSP (VS Code/IDE).
What we would like to do about it
- Only nobu can maintain the parser.
  - We don't know what's where and what's what.
- We would like to make a parser that leaves the comment text because LSP includes documents in the comment text.
  - If it is MRI, the comment text can be discarded at the time of parsing
- If we will use to LSP, we would like to parse the half-written source code that is about to be broken and do something with it.
  - With MRI, syntax errors end up as syntax errors, and that is enough to meet the requirements..
  - We want a parser that has error recovery.

About parsing speed

soutaro: Does Ruby runtime parsing have to be fast, or can we ignore the fact that Ruby parsing is 10x slower than the current version*3?
- mame: 10x slower would cause problems in some applications, but I don't think it's a bottleneck.
  - ko1: Parsing is time consuming. Isn't that why so many people use bootsnap?
    - mame: That's right.
naruse: Are you talking about parsing and having ISEQ compiled in the same sentence?
- naruse: I think running the parser in parallel is a good idea.
- naruse: I don't think parsing speed will be as much of a bottleneck as we expect. On the other hand, we won't know if it is acceptable to be 10x slower until we actually measure it.
  - mame: I think 10x might have an impact.

About parser gem

mame: Steep uses the parser gem.
- mame: The parser gem does not seem to be able to parse where the comma is, but it is much better than the Ruby parser.
- soutaro: Steep uses the parser gem, which is sufficient to meet the requirements. This is because Steep doesn't need to parse very fast.
???: Will the idea of using the parser gem as the parser for MRI be adopted?
- mame: I hadn't thought of that idea. I was thinking of bundling it*4.
  - soutaro: Is it realistic to bundle two equivalent functions almost simultaneously?
    - mame: Frankly, it's very hard.
- mame: How realistic is the idea of using the parser gem as a parser for MRI?
  - nobu: It's not realistic.
  - nobu: MRI can't bundle the parser gem.

Bootstrap problem

nobu: I argue that we don't need to support even older versions of Ruby's syntax. But what about Ruby to run the parser?
- nobu: Suppose I use the parser gem to parse it. Would it only be about 5x slower than the current parsing speed?
  - mame: If we use the parser gem, it will be much slower because it parses more detailed information.
  - nobu: 10x slower than now is unacceptable.
    - nobu: All bundles will be slowed down (I didn't hear it correctly).
nobu: If we would like to use a parser created in Ruby, do we need to make it AOT?*5
- mame: If we were to do that, I think it would be in the form of an ISEQ embedded in the MRI.
  - ko1 & mame: Theoretically, it is possible. However, whether it makes sense to make it happen is another topic.
- nobu: Doesn't ISEQ have to be created in the same revision?
  - ko1: No, you can just use the parser gem and an older revision.
    - nobu: To embed the current ISEQ, the ISEQ itself is not that portable.
mrkn: Shouldn't the new parser be parsed with parse.y?
- mrkn: We could keep parse.y and use it for bootstrapping.
  - nobu: Wouldn't we have to maintain the new parser and prase.y twice?
    - mrkn: I don't think parse.y needs to be maintained. We can write the parser in old Ruby.
- shyouhei: Isn't the first assumption that "We would like to remove parse.y"?
  - nobu: We don't have to go as far as deleting it. But we need to organize it so that we can maintain it.

About LSP

naruse: LSP must work with older versions as well as the parser gem.
- naruse: What we might be looking for is not a parser to run Ruby itself. We might want something like the parser gem, which will parse properly in multiple old Ruby versions.
nobu: At this point*6, it may not be necessary to create a strict parser for LSP.
- nobu: It may not be necessary to create a strict parser for LSP. I think detailed error checking is unnecessary.
  - soutaro: No, it's not.
  - naruse: I would like to see some kind of error output for anything that doesn't go through XXX (I didn't catch that...).

Reinventing the wheel

ko1: It went from "Just parse it to run Ruby" to other requirements gradually increasing. Therefore, it was time to rethink the parser itself.
ko1: Matz, do you would like to rebuild the parser as a hobby?
- matz: No, I don't.
nobu: Has anyone created a parser in mruby?
- matz: hasumikin rebuilt the mruby parser using Lemon.*7
  - naruse: Lemon is not that expressive.
  - naruse: Yukihiro 'Matz' Matsumoto said so: If we're going to rebuilt the Ruby parser, let's introduce a more expressive parser. And let's make Ruby's syntax more complex.
  - naruse: Yukihiro 'Matz' Matsumoto said so: I used to think that I would simply keep the syntax the same. But after thinking about it for several years, I realized that "The complexity of the grammar is the identity of Ruby".*8

What is Ruby parser's problems?

akr: What makes parser.y hard to maintain?
- matz: Probably because parse.y is hard to read.
- akr: I personally expect the following to work:
  - akr: We are using Yacc to write parse.y. Therefore, we have to handwrite all BNF (i.e., We have to handwrite Ruby's syntax).
  - akr: In cases of Ruby, we often handwrite BNF like combinatorics (e.g., Arguments). In other words, we will have to YYY (I didn't catch that...).
    - akr: If that's the problem, we might want to have a function*9 that can generate such a thing.
- matz: If you adopt a top-down parser*10, we can't make "Pass parameters to rules". I often wish I could do that.*11
  - akr: I think we need to sort out what the problem is, including such things.

Error Handling and Error Recovery

???: Isn't error handling what you were talking about earlier?
- mame: It's easier for everyone to understand if you say, "A new requirement has come up."
- akr: Yacc also has an error handling feature.
- mame: I'm not necessarily saying "Stop using Bison", I'm saying it's okay to use Ripper.
- nobu: The topic earlier was something like this: "We want a parse that the source code to the end even if an error occurs".
- naruse: yui-knk had previously tried to implement error handling.*12
- nobu: Isn't the current IRB like this: "Even if an error occurs, a parser parses the source code until the end and generate a token."
- mame: Aren't we talking about the fact that lexer can do it? Aren't we talking about lexer being able to handle errors?
- soutaro: Does mame-san want such a parser that here is a syntax error here, and the rest of the syntax is like this?*13
  - mame: I would like to see such a feature if possible.
  - mame: If I use my keynote as an example, writing the keyword argument ':' would result in a parsing error and the parsing would be terminated. However, if possible, I want the source code to be parsed to the end without error.
    - mame: In order to complete the argument information, we need the core syntax tree with 'keyword:'.
    - akr: I think it is possible to transform the syntax in a concrete way. I think it's possible to make a syntax that stops with an error in the middle from a complete syntax.
    - mame: These items are not organized items. It's just that he wrote up "something I am frustrated with" on the slides, it's not that I want to do what's written on the slides .
mame: yui-knk was trying to implement error recovery? Or was it trying to implement error handling?
- yui-knk: Does error recovery mean that the token is broken in the process?
- yui-knk: Error recovery is often found in the corner of textbooks*14, right?
  - mame: I know.
  - yui-knk: I haven't read the specific or detailed explanation properly. I have the impression that most of the time you just create a token that seems to apply for now and recover it.
    - akr: We don't really generate a token. I think we should just "pretend that the non-terminal in question has been generated" and proceed.
    - yui-knk: That's right.
    - akr: You can actually make it an error as "this non-terminal is required" regardless of the actual token.
- yui-knk: For a non-conflicting language like Ruby, is there a uniquely defined set of things that should be complemented?
  - matz: It is not uniquely determined.

Conclusions

I have hit upon a truly marvelous Ruby parser, which this margin is too narrow to contain.*15

In the end

Thank you for reading my poor English article to the end.

*1:This corresponds to 11:54 - 28:06 in the video.

*2:or 魔境

*3:ver. 3.0.2 at November 11, 2021

*4:The author does not understand what is meant by "bundle".

*5:Ahead-of-time compilation

*6:At September 10, 2021

*7:https://shimane.monstar-lab.com/hasumin/mmruby-on-RubyKaigi-Takeout-2020

*8:Mr.X said so: That's a troubling identity lol.

*9:Parser generator generator

*10:Yacc/Bison is a bottom-up parser.

*11:It is possible to "Pass values from rules".

*12:At this time, yui-knk was not present.

*13:Because we want the input to be complementary while we are writing the source code.

*14:What's that!? I don't know that...

*15:I have not come up with a good alternative.

虚無庵

徒然なるままに