Source Code: Querying and Serving N-gram Language Models with Python


Abstract


Statistical n-gram language modeling is a very important technique in Natural Language Processing (NLP) and Computational Linguistics used to assess the fluency of an utterance in any given language. It is widely employed in several important NLP applications such as Machine Translation and Automatic Speech Recognition. However, the most commonly used toolkit to build such language models on a large scale is written entirely in C++ which presents a challenge to an NLP developer or researcher whose primary language of choice is Python. The primary article describes how to build a native and efficient Python interface to this toolkit such that such language models can be queries and used directly in Python code. In addition, it also describes how to build a Python language model server. Such a server can prove to be extremely useful when the language model needs to be queried used by multiple clients over a network: the language model must only be loaded into memory once by the server and can then satisfy multiple requests. This article supplements the primary article and provides the entire set of source code listings along with appropriate technical comments where necessary. Some of the listings may already be included with the primary article (in complete or excerpted form) but are reproduced here for the sake of completeness.

Full Text:

PDF