文档库

最新最全的文档下载
当前位置:文档库 > acl2011 - Faster and Smaller N-Gram Language Models

acl2011 - Faster and Smaller N-Gram Language Models

Faster and Smaller N-Gram Language Models

Adam Pauls Dan Klein

Computer Science Division

University of California,Berkeley

{adpauls,klein}@http://www.wendangku.net/doc/536346cd3186bceb19e8bbc7.html

Abstract

N-gram language models are a major resource

bottleneck in machine translation.In this pa-

per,we present several language model imple-

mentations that are both highly compact and

fast to query.Our fastest implementation is

as fast as the widely used SRILM while re-

quiring only25%of the storage.Our most

compact representation can store all4billion

n-grams and associated counts for the Google

n-gram corpus in23bits per n-gram,the most

compact lossless representation to date,and

even more compact than recent lossy compres-

sion techniques.We also discuss techniques

for improving query speed during decoding,

including a simple but novel language model

caching technique that improves the query

speed of our language models(and SRILM)

by up to300%.

1Introduction

For modern statistical machine translation systems, language models must be both fast and compact. The largest language models(LMs)can contain as many as several hundred billion n-grams(Brants et al.,2007),so storage is a challenge.At the same time,decoding a single sentence can trig-ger hundreds of thousands of queries to the lan-guage model,so speed is also critical.As al-ways,trade-offs exist between time,space,and ac-curacy,with many recent papers considering small-but-approximate noisy LMs(Chazelle et al.,2004; Guthrie and Hepple,2010)or small-but-slow com-pressed LMs(Germann et al.,2009).

In this paper,we present several lossless meth-ods for compactly but efficiently storing large LMs in memory.As in much previous work(Whittaker and Raj,2001;Hsu and Glass,2008),our meth-ods are conceptually based on tabular trie encodings wherein each n-gram key is stored as the concatena-tion of one word(here,the last)and an offset encod-ing the remaining words(here,the context).After presenting a bit-conscious basic system that typifies such approaches,we improve on it in several ways. First,we show how the last word of each entry can be implicitly encoded,almost entirely eliminating its storage requirements.Second,we show that the deltas between adjacent entries can be efficiently en-coded with simple variable-length encodings.Third, we investigate block-based schemes that minimize the amount of compressed-stream scanning during lookup.

To speed up our language models,we present two approaches.Thefirst is a front-end cache.Caching itself is certainly not new to language modeling,but because well-tuned LMs are essentially lookup ta-bles to begin with,naive cache designs only speed up slower systems.We present a direct-addressing cache with a fast key identity check that speeds up our systems(or existing fast systems like the widely-used,speed-focused SRILM)by up to300%.

Our second speed-up comes from a more funda-mental change to the language modeling interface. Where classic LMs take word tuples and produce counts or probabilities,we propose an LM that takes a word-and-context encoding(so the context need not be re-looked up)and returns both the probabil-ity and also the context encoding for the suffix of the original query.This setup substantially accelerates the scrolling queries issued by decoders,and also exploits language model state equivalence(Li and Khudanpur,2008).

Overall,we are able to store the4billion n-grams of the Google Web1T(Brants and Franz,2006)cor-