Welcome to LibIndic Stemmer’s documentation!

LibIndic’s stemmer module may be used to extract stems of the words in a sentence. It is implemented in a rule-based model and follows iterative suffix stripping to handle multiple levels of inflection. Right now, it supports Malayalam language only.

Usage

>>> from libindic.stemmer import Malayalam as mlstemmer
>>> stemmer = mlstemmer()
>>> result = stemmer.stem('രാമന്റെ വീട്ടിലേക്ക്')
>>> for word, stem in result.items():
...     print word, " : ", stem
...
രാമന്റെ  :  രാമൻ
വീട്ടിലേക്ക്  :  വീട്

API reference

class libindic.stemmer.Hindi[source]

Hindi Stemmer Class

class libindic.stemmer.Malayalam[source]

Malayalam Stemmer class.

get_info()[source]

returns info on the module

get_module_name()[source]

returns the module name.

singleencode(word)[source]

Normalize word to single encoding.

stem(text)[source]
Parameters:text – Malayalam string
Returns:Dictionary with words of the string as keys and their corresponding stems as values.
class libindic.stemmer.Punjabi[source]

Punjabi Stemmer Class

Indices and tables