This article is an English translation of an article written on November 12, 2021. If you find that there are unnatural expressions, I would appreciate it if you could comment on this article.
Summary
This is a summary of "A maintainable, flexible, and usable Ruby parser"*1, one of the topics of the "Ruby Committers vs the World" content at RubyKaigi Takeout 2021.
I have hit upon a truly marvelous Ruby parser, which this margin is too narrow to contain.
A maintainable, flexible, and usable Ruby parser
- Objective
- Problems
- Ideas
- Rewrite it with PEG (like Python)?
- Use the parser gem? Use Ripper?
Incidentally, these items are not organized items, as mame mentioned in the video. It's just that he wrote up "something mame is (or we are) frustrated with" on the slides, it's not that he wants to do what's written on the slides (as mame said so). In other words, he didn't write anything on the slide that we could actually implement.
What is that "something mame is (or we are) frustrated with"?
- What we would like to do
- What we would like to do about it
- Only nobu can maintain the parser.
- We don't know what's where and what's what.
- We would like to make a parser that leaves the comment text because LSP includes documents in the comment text.
- If it is MRI, the comment text can be discarded at the time of parsing
- If we will use to LSP, we would like to parse the half-written source code that is about to be broken and do something with it.
- With MRI, syntax errors end up as syntax errors, and that is enough to meet the requirements..
- We want a parser that has error recovery.
- Only nobu can maintain the parser.
About parsing speed
- soutaro: Does Ruby runtime parsing have to be fast, or can we ignore the fact that Ruby parsing is 10x slower than the current version*3?
- naruse: Are you talking about parsing and having ISEQ compiled in the same sentence?
- naruse: I think running the parser in parallel is a good idea.
- naruse: I don't think parsing speed will be as much of a bottleneck as we expect. On the other hand, we won't know if it is acceptable to be 10x slower until we actually measure it.
- mame: I think 10x might have an impact.
About parser gem
- mame: Steep uses the parser gem.
- ???: Will the idea of using the parser gem as the parser for MRI be adopted?
- mame: I hadn't thought of that idea. I was thinking of bundling it*4.
- soutaro: Is it realistic to bundle two equivalent functions almost simultaneously?
- mame: Frankly, it's very hard.
- soutaro: Is it realistic to bundle two equivalent functions almost simultaneously?
- mame: How realistic is the idea of using the parser gem as a parser for MRI?
- nobu: It's not realistic.
- nobu: MRI can't bundle the parser gem.
- mame: I hadn't thought of that idea. I was thinking of bundling it*4.
Bootstrap problem
- nobu: I argue that we don't need to support even older versions of Ruby's syntax. But what about Ruby to run the parser?
- nobu: Suppose I use the parser gem to parse it. Would it only be about 5x slower than the current parsing speed?
- mame: If we use the parser gem, it will be much slower because it parses more detailed information.
- nobu: 10x slower than now is unacceptable.
- nobu: All bundles will be slowed down (I didn't hear it correctly).
- nobu: Suppose I use the parser gem to parse it. Would it only be about 5x slower than the current parsing speed?
- nobu: If we would like to use a parser created in Ruby, do we need to make it AOT?*5
- mame: If we were to do that, I think it would be in the form of an ISEQ embedded in the MRI.
- ko1 & mame: Theoretically, it is possible. However, whether it makes sense to make it happen is another topic.
- nobu: Doesn't ISEQ have to be created in the same revision?
- ko1: No, you can just use the parser gem and an older revision.
- nobu: To embed the current ISEQ, the ISEQ itself is not that portable.
- ko1: No, you can just use the parser gem and an older revision.
- mame: If we were to do that, I think it would be in the form of an ISEQ embedded in the MRI.
- mrkn: Shouldn't the new parser be parsed with parse.y?
- mrkn: We could keep parse.y and use it for bootstrapping.
- nobu: Wouldn't we have to maintain the new parser and prase.y twice?
- mrkn: I don't think parse.y needs to be maintained. We can write the parser in old Ruby.
- nobu: Wouldn't we have to maintain the new parser and prase.y twice?
- shyouhei: Isn't the first assumption that "We would like to remove parse.y"?
- nobu: We don't have to go as far as deleting it. But we need to organize it so that we can maintain it.
- mrkn: We could keep parse.y and use it for bootstrapping.
About LSP
- naruse: LSP must work with older versions as well as the parser gem.
- nobu: At this point*6, it may not be necessary to create a strict parser for LSP.
- nobu: It may not be necessary to create a strict parser for LSP. I think detailed error checking is unnecessary.
- soutaro: No, it's not.
- naruse: I would like to see some kind of error output for anything that doesn't go through XXX (I didn't catch that...).
- nobu: It may not be necessary to create a strict parser for LSP. I think detailed error checking is unnecessary.
Reinventing the wheel
- ko1: It went from "Just parse it to run Ruby" to other requirements gradually increasing. Therefore, it was time to rethink the parser itself.
- ko1: Matz, do you would like to rebuild the parser as a hobby?
- matz: No, I don't.
- nobu: Has anyone created a parser in mruby?
- matz: hasumikin rebuilt the mruby parser using Lemon.*7
- naruse: Lemon is not that expressive.
- naruse: Yukihiro 'Matz' Matsumoto said so: If we're going to rebuilt the Ruby parser, let's introduce a more expressive parser. And let's make Ruby's syntax more complex.
- naruse: Yukihiro 'Matz' Matsumoto said so: I used to think that I would simply keep the syntax the same. But after thinking about it for several years, I realized that "The complexity of the grammar is the identity of Ruby".*8
- matz: hasumikin rebuilt the mruby parser using Lemon.*7
What is Ruby parser's problems?
- akr: What makes parser.y hard to maintain?
- matz: Probably because parse.y is hard to read.
- akr: I personally expect the following to work:
- matz: If you adopt a top-down parser*10, we can't make "Pass parameters to rules". I often wish I could do that.*11
- akr: I think we need to sort out what the problem is, including such things.
Error Handling and Error Recovery
- ???: Isn't error handling what you were talking about earlier?
- mame: It's easier for everyone to understand if you say, "A new requirement has come up."
- akr: Yacc also has an error handling feature.
- mame: I'm not necessarily saying "Stop using Bison", I'm saying it's okay to use Ripper.
- nobu: The topic earlier was something like this: "We want a parse that the source code to the end even if an error occurs".
- naruse: yui-knk had previously tried to implement error handling.*12
- nobu: Isn't the current IRB like this: "Even if an error occurs, a parser parses the source code until the end and generate a token."
- mame: Aren't we talking about the fact that lexer can do it? Aren't we talking about lexer being able to handle errors?
- soutaro: Does mame-san want such a parser that here is a syntax error here, and the rest of the syntax is like this?*13
- mame: I would like to see such a feature if possible.
- mame: If I use my keynote as an example, writing the keyword argument ':' would result in a parsing error and the parsing would be terminated. However, if possible, I want the source code to be parsed to the end without error.
- mame: In order to complete the argument information, we need the core syntax tree with 'keyword:'.
- akr: I think it is possible to transform the syntax in a concrete way. I think it's possible to make a syntax that stops with an error in the middle from a complete syntax.
- mame: These items are not organized items. It's just that he wrote up "something I am frustrated with" on the slides, it's not that I want to do what's written on the slides .
- mame: yui-knk was trying to implement error recovery? Or was it trying to implement error handling?
- yui-knk: Does error recovery mean that the token is broken in the process?
- yui-knk: Error recovery is often found in the corner of textbooks*14, right?
- yui-knk: For a non-conflicting language like Ruby, is there a uniquely defined set of things that should be complemented?
- matz: It is not uniquely determined.
Conclusions
I have hit upon a truly marvelous Ruby parser, which this margin is too narrow to contain.*15
In the end
Thank you for reading my poor English article to the end.
*1:This corresponds to 11:54 - 28:06 in the video.
*2:or 魔境
*3:ver. 3.0.2 at November 11, 2021
*4:The author does not understand what is meant by "bundle".
*5:Ahead-of-time compilation
*6:At September 10, 2021
*7:https://shimane.monstar-lab.com/hasumin/mmruby-on-RubyKaigi-Takeout-2020
*8:Mr.X said so: That's a troubling identity lol.
*9:Parser generator generator
*10:Yacc/Bison is a bottom-up parser.
*11:It is possible to "Pass values from rules".
*12:At this time, yui-knk was not present.
*13:Because we want the input to be complementary while we are writing the source code.
*14:What's that!? I don't know that...
*15:I have not come up with a good alternative.