gfsd-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gfsd]MC entry


From: James Junmin Fan
Subject: [gfsd]MC entry
Date: Wed, 27 Jun 2001 11:44:59 -0500 (CDT)

%%comments:
Copyright (C) 2000 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the file COPYING.

%%name: MC

%%short-description: a text parser that converts text documents into a vector 
space model based on word frequencies.

%%full-description: MC is a C++ program that creates vector-space models from 
text documents that can be used for text mining applications. MC provides an 
efficient multi-threaded implementation that can process very large document 
collections.

<p> The MC program: 1. Recursively descends directories, finding text files. 2. 
Processes files selectively through full regular expression matching of file 
names. 3. Builds a sparse matrix of word/token counts. The particular sprse 
marix format used is given here. 4. Processes any user specified text 
formats(email address or URLs) as a whole token through regular expression 
matching or FLEX definition. 5. Prunes vocabulary by word length and frequency. 
6. Excludes user specified stop words words. 7. Sets word vector weights 
according any of the txx, txn, tfn, tfx, lxx, lxn, lfn, lfx scaling schemes. 8. 
Writes all data structures to disk in the Compressed Column Storage format.

<p> The application does not: 1. Have English parsing or part-of-speech tagging 
facilities. 2. Have complete documentation. 3. Claim to be bug-free.

%%category: applications

%%license: GPL
%%license verified by:
%%license verified on:

%%maintainer: James Fan <address@hidden>

%%updated: 2001-06-07

%%keywords: text mining, data mining, vector space model, bag of words

%%interface: Command line

%%programs:

%%GNU: Yes

%%web-page: http://www.cs.utexas.edu/users/jfan/dm/

%%support:

%%doc: http://www.cs.utexas.edu/users/jfan/dm/README

%%developers:

%%contributors:

%%sponsors:

%%source: http://www.cs.utexas.edu/users/jfan/dm/src/

%%debian:

%%redhat:

%%repository:

%%related:

%%source-language: C++

%%supported-languages:

%%use-requirements:

%%build-prerequisites: FLEX, STL, pthread lib

%%weak-prerequisites:

%%source-prerequisites:

%%version: 2.19 stable released 2001-06-26

%%announce-list:

%%announce-news:

%%help-list:

%%help-news:

%%dev-list:

%%dev-news:

%%bug-list:

%%bug-database:

%%entry written by: James Fan <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]