nltk - Read For Learn

str.translate gives TypeError – Translate takes one argument (2 given), worked in Python 2

I have the following code I’ve tested lines 2-4 in a separate testing.py file when I wrote and the command prompt returns [‘the’,’the’,’the’], which is what I wanted (removing punctuation). However, when I put the exact same code in a different file, python returns a TypeError stating that json_list is a list of all the … Read more

How to use Stanford Parser in NLTK using Python

Note that this answer applies to NLTK v 3.0, and not to more recent versions. Sure, try the following in Python: Output: [Tree(‘ROOT’, [Tree(‘S’, [Tree(‘INTJ’, [Tree(‘UH’, [‘Hello’])]), Tree(‘,’, [‘,’]), Tree(‘NP’, [Tree(‘PRP$’, [‘My’]), Tree(‘NN’, [‘name’])]), Tree(‘VP’, [Tree(‘VBZ’, [‘is’]), Tree(‘ADJP’, [Tree(‘JJ’, [‘Melroy’])])]), Tree(‘.’, [‘.’])])]), Tree(‘ROOT’, [Tree(‘SBARQ’, [Tree(‘WHNP’, [Tree(‘WP’, [‘What’])]), Tree(‘SQ’, [Tree(‘VBZ’, [‘is’]), Tree(‘NP’, [Tree(‘PRP$’, [‘your’]), Tree(‘NN’, [‘name’])])]), … Read more

Spell Checker for Python

I’m fairly new to Python and NLTK. I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one). I’m currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library. The code below is a class that handles the correction/replacement. I have written a … Read more

how to check which version of nltk, scikit learn installed?

import nltk is Python syntax, and as such won’t work in a shell script. To test the version of nltk and scikit_learn, you can write a Python script and run it. Such a script may look like Note that not all Python packages are guaranteed to have a __version__ attribute, so for some others it … Read more

What is the difference between lemmatization vs stemming?

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of … Read more

Python can’t find module NLTK

On OS X you could have multiple installation of Python, so investigate it first: All within /usr/bin are built-in and all other in /usr/local/bin are external installed by Homebrew or some other package manager. If you’re using pip or pip3 from /usr/local, then you’ve to use the same Python instance, otherwise they’re different instances. Just install it via pip: or for Python 3: then run … Read more

What are all possible pos tags of NLTK?

The book has a note how to find help on tag sets, e.g.: Others are probably similar. (Note: Maybe you first have to download tagsets from the download helper’s Models section for this)

n-grams in python, four, five, six grams?

Great native python based answers given by other users. But here’s the nltk approach (just in case, the OP gets penalized for reinventing what’s already existing in the nltk library). There is an ngram module that people seldom use in nltk. It’s not because it’s hard to read ngrams, but training a model base on ngrams where n > 3 will result … Read more

re.sub erroring with “Expected string or bytes-like object”

As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub. The simplest way is to change location to str(location) when using re.sub. It wouldn’t hurt to do it anyways even if it’s already a str.