Moved

This project moved to Google Code. This site is no longer maintained and is only kept for historic reasons. You should get redirected automatically in 20 seconds. If not, or if you don't like to wait, click here

Description

python-ngram is a package that allows the calculation of the similarity between two strings using the approach of n-grams.
You might say "Why not use difflib.Sequencematcher". Short answer: They are NOT the same!
Long answer: difflib.Sequencematcher bases it's comparison on the longest common character chain. So certain strings can have a high score even though they might differ greatly in length. In some cases you don't want that. Using n-grams is far less tolerant of different string lengths. Which method to use depends largely on the task. As a sidenote, although I have no empyrical proof for that: I beleive that ngram calculates the score faster.
Project homepage

News (with text)

Error: RSS file not found...

 

Download

Error: RSS file not found...

 

Installation

INSTALLATION

Linux

RPM-Installation

I'm not familiar with RPM-distributions but as far as I know it should be something like

rpm -i <filename.rpm>

RPM-source Installation

This is something I don't know. If somebody can enlighten me, please do!

Binary/Source installation

Untar the package with you favourite archive tool. On the console it will be something along the lines

tar xzf <filename.tar.gz>

Next, go to the folder just created. It will have the same name as the package (for example "ngram-1.0.0b1") and run:

python setup.py install

For this step you need root-priviledges

Windows

Execute the executable file and follow the instructions displayed. Default values will be fine in most cases.

MacOS-X

Simply follow the same instructions as with the Linux-Source installation.

Roadmap

Support This Project  SourceForge.net Logo