
Typo is something that often happens and can reduce user’s experience, fortunately, Elasticsearch can handle it easily with Fuzzy Query.
Handling typos is a must if you’re building an advanced autocomplete system with Elasticsearch.
What is fuzzy logic
Fuzzy logic is a mathematics logic in which the truth of variables might be any number between 0 and 1. It is different with a Boolean logic that only has the truth values either 0 or 1.
In Elasticsearch, fuzzy query means the terms in the queries don’t have to be the exact match with the terms in the Inverted Index.
To calculate the distance between query, Elasticsearch uses Levenshtein Distance Algorithm.
How to calculate distance using Levenshtein Distance Algorithm
Calculating a distance with Levenshtein Distance Algorithm is easy.
You just need to compare the first and second word character by character.
If the character is different, then you can add the distance between the words by one.
Let’s see an example, how to calculate the distance between the common typo word “Gppgle” with the correct word “Google”.
After we calculate the distance between “Gppgle” and “Google” with Levenshtein Distance Algorithm, we can see that the distance is 2.
Fuzzy Query in Elasticsearch
Handling typo in Elasticsearch with Fuzzy Query is also simple.
Let’s start with making an example of the typo word “Gppgle”.
When we’re using normal Match Query, the Elasticsearch will analyze the query “gppgle” first before searching it into the Elasticsearch.
The only term in the inverted index is “google” and it doesn’t match the term “gppgle”. Therefore, the Elasticsearch won’t return any result.
Now, let’s try Elasticsearch’s fuzzy in Match Query.
Two types of a fuzzy query in Elasticsearch
In the previous example, we use a fuzzy query as a parameter inside Match Query.
But there is another way to use the fuzzy feature, Fuzzy Query.
As you can see, the standard_analyzer produce two terms, “hong” and “kong”.
react to section with heart
react to section with light
react to section with money
react to section with thumbs-down
If you read my other article “Elasticsearch: Text vs. Keyword”, you’d know that if we use a term query to search “Hong Kong” then we won’t get any result.
This is because there is no term that has less than 2 edit distance with “Hong Kong” in the Elasticsearch.
Now, Let’s try Fuzzy Query with “Hpng” .
Two types of a fuzzy query in Elasticsearch
In the previous example, we use a fuzzy query as a parameter inside Match Query.
But there is another way to use the fuzzy feature, Fuzzy Query.
Seems to be the same! So, what’s the difference between them?
Fuzzy Query
Fuzzy Query works like just Term Query, the query to Elasticsearch is not analyzed and used raw to search the Inverted Index.
For example, let’s index one more document “Hong Kong” .
Term “Hpng” in the query and the term “hong” in the Elasticsearch have a distance of two.
Remember that the term queried and the term in the inverted index is case-sensitive, the distance “2” comes from the difference between “Hp” and “ho”.
Match Query with Fuzziness parameter
Match Query with fuzziness parameter is more preferable than Fuzzy Query. The analyzer in the query will analyze your query before searching it into the Inverted Index.
Let’s try the same query as we did in the Fuzzy Query’s section.
As expected, both queries returned a result!
The first query, “Hpng Kong” is analyzed into “hpng” and “kong”. Both terms “hpng” and “kong” exist in the Inverted Index.
“hpng” and “hong” matched with a distance of 1.
While “kong” and “kong” match perfectly.
One thing to note if you plan to use Match Query is that every of the terms in the query will allow fuzziness.
Tuning the Fuzzy Query in Elasticsearch
You can tune the Fuzzy Query to match your use case.
In this section, I will write about the parameters that we can change in the query.
Fuzziness
Fuzziness is the heart of Fuzzy Query.
The value that we pass to this parameter is the maximum distance allowed.