## Spelling Recommender

We showed how you can build an autocorrect based on Jaccard distance by returning also the probability of each word. We will create three different spelling recommenders, that each takes a list of misspelled words and recommends a correctly spelled word for every word in the list. For every misspelled word, the recommender should find the word in `correct_spellings`

that has the shortest distance and **starts with the same letter** as the misspelled word, and return that word as a recommendation.

As a dictionary of correct words, we will consider the `words` from NTLK

Note: Each of the two different recommenders will use a different distance measure. For our example, we will consider the following misspelled words: [**spleling**, **mispelling**, **reccomender**]

## Jaccard distance on the 2 Q-Grams of the two words

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union and can be described by the following formula:

Also, we will work with Q-grams which are equivalent to N-grams but they referred to characters instead of tokens. We will provide an example by taking the Jaccard distance of 2 Q-Gams. First, we will take the dictionary of the correct words from NLTK.

import nltk from nltk.corpus import words correct_spellings = words.words() from nltk.metrics.distance import jaccard_distance from nltk.util import ngrams from nltk.metrics.distance import edit_distance

Since we loaded the libraries, let’s work on the function. We will work with list comprehensions.

entries=['spleling', 'mispelling', 'reccomender'] for entry in entries: temp = [(jaccard_distance(set(ngrams(entry, 2)), set(ngrams(w, 2))),w) for w in correct_spellings if w[0]==entry[0]] print(sorted(temp, key = lambda val:val[0])[0][1])

And we get:

```
spelling
misspelling
recommender
```

Good job no? We got the right words

## Edit Distance

A popular approach of distance measure is the Edit Distance which is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. The main transformations are the following:

**Insertion**of a new character. Example:**boat**vs**boats**(insertion of an`s`

)**Deletion**of an existing character. Example:**fair**vs**far**(deletion of an`i`

)**Substitution**of an existing character. Example:**lamp**vs**lamb**(substitution of`p`

with`b`

)**Transposition**of two existing consecutive characters. Example:**sing**vs**sign**(transposition of`ng`

to`gn`

)

for entry in entries: temp = [(edit_distance(entry, w),w) for w in correct_spellings if w[0]==entry[0]] print(sorted(temp, key = lambda val:val[0])[0][1])

and we get:

```
selling
misspelling
recommender
```

## Conclusion

We showed how we can apply a Spelling Recommender using Python and NLTK. We have to mention that the advanced spelling recommenders take into consideration more factors than just the distance, such as the context and the relative frequency of the suggested word.

## 1 thought on “Spelling Recommender with NLTK”