Predictive Hacks

Spelling Recommender with NLTK

autocorrect

Spelling Recommender

We showed how you can build an autocorrect based on Jaccard distance by returning also the probability of each word. We will create three different spelling recommenders, that each takes a list of misspelled words and recommends a correctly spelled word for every word in the list. For every misspelled word, the recommender should find the word in correct_spellings that has the shortest distance and starts with the same letter as the misspelled word, and return that word as a recommendation.

Note: Each of the three different recommenders will use a different distance measure.

For our example, we will consider the following misspelling words: [spleling, mispelling, reccomender]


Jaccard distance on the 2 Q-Grams of the two words


import nltk
from nltk.corpus import words

correct_spellings = words.words()



from nltk.metrics.distance import jaccard_distance
from nltk.util import ngrams
from nltk.metrics.distance  import edit_distance
 

Since we loaded the libraries, let’s work on the function. We will work with list comprehensions.

entries=['spleling', 'mispelling', 'reccomender']

for entry in entries:
    temp = [(jaccard_distance(set(ngrams(entry, 2)), set(ngrams(w, 2))),w) for w in correct_spellings if w[0]==entry[0]]
    print(sorted(temp, key = lambda val:val[0])[0][1])

And we get:

spelling
misspelling
recommender

Edit Distance

Now, we will work with the Edit Distance


for entry in entries:
    temp = [(edit_distance(entry, w),w) for w in correct_spellings if w[0]==entry[0]]
    print(sorted(temp, key = lambda val:val[0])[0][1])
 

and we get:

selling
misspelling
recommender

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “Spelling Recommender with NLTK”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

connect with sql
R

How to Connect R with SQL

Need to Connect R with SQL It is common for Data Analysts/Scientists to connect R with SQL. For that reason,

[the_ad_group id="232"]
[the_ad id="2133"]