25
AlgoRythm
274d

Very excited to announce I've started another project I will never finish

Comments
  • 3
    @retoor I'm going to write an HTTP/HTML ecosystem using Zig and, later, Lua.

    I'm going to compete with PHP, .Net, and Node.JS.

    It will start with HTTP server and HTML parsers in Zig, then eventually move into a whole razor-like preprocessor language where Lua is the "guest" language (I don't really want to mess around with compiling something like zig on-the-fly for use in preprocessor statements)

    Eventually it would be nice to create an API where you're writing controllers and everything in Lua instead of Zig.

    However you and me both know it will barely get past the "creating a socket and parsing the incoming http request" phase.
  • 1
    @retoor We've had the parser discussion before. I've written HTML parser in c++.

    The hardest part will be finding time and motivation for it. I've already got my real job and a side hustle that takes up like 70 hours of my week but I want to do more challenging work with more challenging languages.

    I tried writing a Markdown parser in zig and got decently far with a very efficient design, but stopped short of a working product because Markdown is actually a pain in the FUCKING ass to parse (due to its lax rules around syntax and the importance of newlines)

    It's just a ton of work that I'll be excited about for maybe two weeks
  • 0
    @retoor Markdown really is a pain. My advice to you is to treat all whitespace as significant tokens while lexing and then discard the insignificant ones while parsing. I went in with the same mentality as I had from my HTML parser ("Whitespace is almost never significant") and it really fucked me from the get-go.

    The parsing states also need to be recursive in some respects due to the fact that the Markdown syntax is indeed recursive (you can have a code block inside a multi-line quote) so you'll need to have a *stack* of states which can all talk to each other - which really blurs the lines between lexing and parsing.

    A *good* MD parser is actually incredibly difficult and I will end up giving it another go sometime in the future, but for now, I'm still feeling burned by it and I'm just gonna let it be.
  • 0
    @retoor One problem I really struggled with at the time and indeed still haven't quite cracked with regards to MD is what needed to be done at lexing level and what needed to be done at parsing level. Honestly looking back, maybe I just needed to go very basic at the lexing level and do most of the work while parsing. Unfortunately, tokens are extremely context-based and the context is extremely non-linear. For example something as simple as a "#" character has an extreme number of rules based on what the context around it is.
  • 0
    @retoor You know what, maybe I will try again with MD instead of starting this huge time sink. I really don't have the hours to put into a .Net alternative, but the markdown parser might just scratch my itch AND I can use it in my side hustle.
  • 0
    @retoor My original plan was to make an executable that you could start up as a child process and feed characters into stdin and read them out of stdout, this way you can use it from any language that supports basic file operations. But of course it will be in Zig and it should be binary compatible with C if compiled as a static/dynamic library
  • 0
    @retoor I've just went out and had an idea by the time I got back. To solve my lexer/parser problem, why don't I just establish communication between the two?

    My regular design is to have the lexer parse tokens from text which are then passed to the parser, once at a time - but they never, ever communicate. They are two separate entities. But what if the lexer and the parser worked together to establish the context that the lexer needed to lex with? That way, the lexer can still follow primitive lexing rules but with context established by the complicated syntax that the parser can understand?

    It seems above my pay-grade but I'm going to try and prototype one over the course of this week. Maybe it could be really good and efficient and solve a lot of the "whose job is it anyways?" problems.
  • 0
    Depends on the format and how masochistic you are, but unless you're parsing something with zero flexibility like regex or a deliberately highly principled data format like JSON or XML there should always be an intermediate datastructure which has less structure and more information than your final output, such as a token tree with tokens representing all the whitespace information that has any chance to be relevant depending on context.
  • 0
    In principle a single-pass parser can just be a normal N-stage parser with lazy datastructures, compilers are perfectly capable of nesting the stages inside each other so you just have to write the lazy-evaluated store once.

    The main design constraint then is that your datastructures have to be partitioned nicely so that you don't spend too much time following pointers but also don't calculate unused results.
  • 0
    Either way I think optimizing parsers for code specifically is rarely a good choice because they have other conflicting objectives and much of the work is just inherently slow if you want to do it perfectly right.

    My strategy now is to focus on good errors when writing the parser and then make sure that I never have to run it again by caching ASTs all over the place.
  • 0
    In particular, it shouldn't be possible for a code parser to hard crash or overflow the stack on any input no matter how silly. If you add a virtual stack and every possible bounds check to normal code, all of a sudden simplicity seems a lot more valuable.
  • 0
    @retoor Now you're confusing me because you always need to lex before you parse. You're typically parsing tokens which are produced by your lexer

    @lorentz my goal isn't to bring a new markdown parser to market, it's just to create one and learn/understand lexer/parsers more. Plus, as I've been saying, Markdown is actually very complicated in terms of syntax. I'm trying to do a single-pass where the lexer produces tokens that the parser is able to convert directly into an AST which can then be rendered directly into HTML by a renderer
  • 0
    @AlgoRythm I'm not sure I understand. When you say that the parser converts tokens directly into an AST, what sort of indirection are you avoiding? Are you saying that the parser can consume tokens one by one without organizing them into a structure first?
  • 0
    @lorentz you were talking about "[...] an intermediate datastructure which has less structure and more information than your final output, such as a token tree with tokens representing all the whitespace information that has any chance to be relevant depending on context."
  • 0
    Is software ever really finished though?
Add Comment