Spend several months designing an alternative approach to string matching/parsing not based on parser combination or Reg

Ranter

pk76

1141

Comments

2

pk76

1141

6y

@Demolishun github.com/entomy/stringier
YouTube links are weird but a search for "introducing Stringier" should have it as the first result.
0

endor

5475

6y

.
0

RememberMe

13709

6y

@pk76 I only went over the link cursorily, not enough time, but could you explain what makes your approach better than regexes and parser combinators (or techniques like fast and efficient table-driven LR parsers or dynamic programming algorithms like CYK)?
1

pk76

1141

6y

@RememberMe sure.

Unlike Regex, patterns are reusable, so if you're defined something like an "identifier" for a programming language, and want to put that inside of a larger pattern, you can easily do so because these are assigned to variables and are combined. With most Regex implementations you have to copypasta. There's also error reporting, right down to the specific part of the pattern which failed to match.

Unlike parser combinators you're working with patterns, descriptions of what to parse, rather than combining smaller parsers. This makes the approach a bit more declarative. It also allows some optimizations made during initialization, based on rewriting parts of the pattern in ways that work faster, similar to optimizing compilers. As long as the patterns would otherwise be identical, a rewrite can occur. Parser combinators require you to know/reference what approach is optimal.

As compared to more advanced approaches like what you listed, it's complicated...
1

pk76

1141

6y

@pk76 @RememberMe

In a very broad sense I could claim ease of use. But as I haven't gotten around to benchmarking these, nor doing any serious comparisons for advantages/disadvantages/pitfalls, I don't feel comfortable comparing against these yet as it's too likely to be unfair or inaccurate.

Which then brings me to performance. There's charts in the presentation, but the tl;dw is, highly competitive performance despite infancy, with substantially lower memory usage.
1

powerfulparadox

1046

6y

📌
0

RememberMe

13709

6y

@pk76
Firstly, FParsec. That library can match context sensitive grammars and is intended for ease of use compared to handrolling or LR, because correctly parsing advanced grammars is hard. Ref: https://microsoft.com/en-us/...

Even theoretically for full CFGs the best algorithms like CYK work in O(n^3) time, so comparing FParsec to regular grammar parsers like regex/yours is absurd, it's an apples to oranges comparison. Your benchmark is all regular stuff like identifiers and addresses, with no recursive structure.

Also it's an established design flaw of combinators that they use a ton of memory, have horrible cache friendliness, tons of indirection, and are usually opaque to optimisation because of their construction (it's just a bunch of higher order functions, so they can't observe their own structure to optimise/better error reporting). Error reporting is also relative to the expressive power of the parser.
0

RememberMe

13709

6y

@pk76 (note: "regular" here means the type of grammar that regex can accept, it's a technical word. Could also say left-linear/right-linear).

I also don't think that lifting regex building blocks into objects and operators is that big a deal, there is no effective difference that I can see between your patterns and regex patterns in terms of functionality, just that regex uses strings while you use pattern objects. One could easily write a wrapper class above regexes that internally called the msregex implementation, and lifting regex string concatenation into that class would also cover your library's other USP, the reusable patterns (which add no theoretical expressive power anyway, unless they can be called recursively).

It is neater though, I'll give it that, personally I too prefer regexes declared like this instead of messy strings, but yeah.

Don't want to sound dismissive or anything, and I might have missed something, those are just my opinions about the library.
1

pk76

1141

6y

@RememberMe So because I don't have an example of LR/LR/LALR/x in the benchmarks, it's only capable of handling those? Isn't it more likely I haven't added those benchmarks yet, as I'm trying to provide the fairest comparisons and don't want to use a bad FParsec implementation? You clearly watched the video as you extracted a screenshot from it, on the very slide you extracted I had stated exactly what I've had to here: The benchmarks are not complete and I would like to add more, but I'm not going to publish misleading benchmarks and need to figure out, or find, good implementations for each.
1

pk76

1141

6y

@RememberMe I've been using this for parsing a language with at least LL grammar, and haven't had a reason to push farther given my requirements. So no, it's not just limited to regular grammars and you're definately missing something; I'd suggest not basing capabilities of a system off its benchmarks.

I should see how complex of grammars this will handle, you are not incorrect to wonder that. However, handling grammars more complex than I needed was never a design goal, so I've only ever validated up to that requirement, and then focused on other matters.
0

pk76

1141

6y

@RememberMe "regular here means type of grammar"

Thank you for just assuming I'm an idiot. Cool.
0

RememberMe

13709

6y

@pk76 oh you do plan to support more advanced grammars? in the video you primarily kept bringing up comparisons to regex, leading me to believe it was intended to replace regexes only. My bad.

Having gone through the documentation, I keep noticing that you try to differentiate your library from parsec, but don't mention anything that parsec doesn't already provide and the design of things is basically the same as what parsec does. How is it different then? The parsec style is constrained by being designed with purity in mind, but a general parser combinator is just that - parsers and combinators which act on parsers, which is basically what your library is.
1

RememberMe

13709

6y

@pk76 the "regular" bit was a clarification in case you thought I meant "ye olde grammar", because language ambiguity. In the same paragraph I also used terms like left-linear and right-linear, and I've been freely throwing terms like LR around which I wouldn't expect an idiot to know, so you can rest assured I don't think you're one.
0

pk76

1141

6y

@RememberMe pattern descriptions and parsers are technically separated with this approach, although I will say I've only implemented on parser currently. I will admit it's a subtle difference: combining pattern descriptions instead of combining parser functions. I would say it's derivative but highly inspired by the approach. The OO approach to the same goal otherwise.
1

pk76

1141

6y

@RememberMe my apologies then. I'll admit I'm a bit prone to hyperdefensive attitudes for various reasons, and it is especially agitated by highly academic stuff.
1

pk76

1141

6y

@RememberMe I would like to. Obviously I'm primarily focused on my own requirements first, but I would like to support at least recursive definitions in time. Since recursion is just a tail call to itself, I believe this should easily be possible, and would probably just require a .recursive() method or something to work around the host language not always supporting recursive variable definitions. I'll work out the specifics when I'm ready for it, as right now I'm primarily working on what I made this for, as well as some additional "checker" patterns which aren't covered in the presentation but are in the documentation under "advanced concepts".
1

pk76

1141

6y

@RememberMe It's a presentation meant for browsers of YouTube. I would give a very different presentation at a professional conference. Know your audience. ;)

Add Comment

rant