1

Question for someone who uses Mongo Atlas Search:

If I'm only interested in autocomplete from the start of the text, which is more performant?

1) standard analyzer + edgeGram tokenizer

2) keyword analyzer + edgeGram tokenizer

I don't see why I should index separate words if I don't care about random positions :/

Thank you

Comments
  • 1
    Let's say you're trying to autocomplete countries, and user types in "Samoa". Do you want autocomplete to show "American Samoa" as an option or not?

    If yes, use standard analyser. If no, use keyword analyser.
  • 0
    I'm a bit confused about @hitko answer.

    When an edgeGram tokenizer is used, tokenizer create tokens of the input string, usually on a min and max value of characters.

    The analyzer searches the **resulting** tokens for matches.

    So... "American Samoa" is broken down into tokens, between min and max chars as far as I know.

    The resulting tokens are analyzed - meaning the analyzer isn't looking at "Samoa"… rather "Sam" "oa" etc (tokens).

    At least this is what I would expect.

    The question is now what you'll expect regarding search terms - e.g. a single search term vs many search terms.
  • 0
    @IntrusionCM You've got the order wrong.

    When using keyword + edgeGram, the whole string is treated as a single text, and then edgeGram creates tokens like "Am", "Ame", ... , "American Sa", and obviously none of those tokens will match "Samoa".

    When using standard + edgeGram, each word of the string is treated as a single text, so the final tokens would be "Am", "Sa", "Ame", "Sam", "Amer", "Samo", ... , and those will match "Samoa".
  • 0
    @hitko If the order was switched, then edgeGrams would be "Am", "Ame", ... , "American Sam", "American Samoa" in both cases, but standard analyser would then split them into words, giving "Am", "Ame", "Amer", ... , "S", "Sa", "Sam", ... - notice this wouldn't respect edgeGram minLength=2, and edgeGram would need to have significant maxLength for any of this to work.
  • 0
    @hitko Thank you! that was what I was trying to do (the "no" way)
  • 0
    @hitko Interesting.

    Didn't have a mongodb database to test, sadly.

    But it makes more sense in your way xD

    And yes, exactly the second comment / conclusion is what made me doubt my sanity...

    Thanks for the longer explanation.
Add Comment