guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: a passionate guy who want to join in as a developer


From: rushan chen
Subject: Re: a passionate guy who want to join in as a developer
Date: Sun, 12 Aug 2012 22:31:22 +0800

Hi Mark,

Very appreciate for your reply. 

I see you mention that it's useful to implement a larger library of efficient data structure, and I'm interested in that very much. I used to work on projects which involve complicated but very interesting data structures, implementing them could be challenging, but once done I feel a great sense of achievement. 

One such project is implementing a language model (LM) which is a core component of speech recognition and machine translation. I don't know if you heard of it before. Unfortunately, I can't cover it too detailed here, that would complicate things too much. 

Basically, one of the key operations LM supports is it should return a probability associated with any given id sequence. All id sequences are of the same length, and there are a mass amount of such id sequences (a commonly-seen LM may contain billions of them). So it's required to store LM in a concise way, and at the same time make the search for each id sequence very quickly.

Trie is finally chosen to be the data structure for LM (there were many papers discussing this issue). All id sequences with the same prefix share the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>, only one copy of <1, 2, 3> will be stored in LM, and a search for a id sequence is done by a sequence of binary search until the leaf is met. One extra thing worth mentioning is that I store the whole trie structure in a single large piece of memory (usually around 2 gigabytes), which makes it convenient to write out to disk and load into memory by simply using mmap, and I think it also makes the system faster than if you allocate memory every time it's needed.

There are some other projects I worked or working on like Spell Corrector, which also involve complicated data structures, but due to privacy policy, I can't say much about it.

All in all, I'm very interested in it, and I really really hope I can help.

Looking forward to your reply. Thanks in advance.

Have fun!

Rushan Chen

reply via email to

[Prev in Thread] Current Thread [Next in Thread]