4

!rant

Just read a really neat breakdown of approaches for auto-suggestion, covering n-grams, tries, and more, by a guy working at Etsy.
This is what I do with my days off apparently.

If you want to read it you can find it here:

https://medium.com/related-works-in...

Comments
  • 2
    "There will be a large disparity between worst case and average case complexity because we’re not indexing random strings — we’re indexing natural language which has a non-random distribution of characters and thus a non-random distribution of n-grams. In fact, those patterns are so far from random that examining character/n-gram distributions is a good language detection strategy²"

    I was thinking you know what would be interesting?

    N-gram distribution likely changes for non-native writers and speakers. Would be interesting to use n-grams to detect non-native speakers and what their native language (likely) is. Could be used to auto-suggest language settings for new users on applications, or potentially detect an article thats been written say, as propaganda for a foreign audience.

    The same could even be applied to detect accents from the n-grams of phonemes, and adjust parameters to enhance detection.
Add Comment