Description

python-ngram is a package that allows the calculation of the similarity between two strings using the approach of n-grams.
You might say "Why not use difflib.Sequencematcher". Short answer: They are NOT the same!
Long answer: difflib.Sequencematcher bases it's comparison on the longest common character chain. So certain strings can have a high score even though they might differ greatly in length. In some cases you don't want that. Using n-grams is far less tolerant of different string lengths. Which method to use depends largely on the task. As a sidenote, although I have no empyrical proof for that: I beleive that ngram calculates the score faster.
Project homepage

News (with text)

SourceForge.net: SF.net Project News: python-ngram 

Download

SourceForge.net: Project File Releases: python-ngram 

Installation

INSTALLATION

Linux

RPM-Installation

I'm not familiar with RPM-distributions but as far as I know it should be something like

rpm -i <filename.rpm>

RPM-source Installation

This is something I don't know. If somebody can enlighten me, please do!

Binary/Source installation

Untar the package with you favourite archive tool. On the console it will be something along the lines

tar xzf <filename.tar.gz>

Next, go to the folder just created. It will have the same name as the package (for example "ngram-1.0.0b1") and run:

python setup.py install

For this step you need root-priviledges

Windows

Execute the executable file and follow the instructions displayed. Default values will be fine in most cases.

MacOS-X

Simply follow the same instructions as with the Linux-Source installation.

Roadmap

Support This Project  SourceForge.net Logo