NAME
genscores - Generate scoring tables from a set of plaintext files.
SYNOPSIS
genscores -type value [-verbose] [-elemsize nchars] [-output outfilename] [-validchars chars] file1 ?file2 ...?
All data is normalized after being generated. Multiple input files may be used to create a large sample of plaintext.
DESCRIPTION
-type value
- The scoring method to use in the generated scoring table. This must be one of the builtin types returned by the
score types
command.
-verbose
- Print a little more information as the table is being generated.
-elemsize nchars
- The size of the elements for ngram based score types.
-output outfilename
- The name of the file where the results should be written. Use '-' for stdout (which is the default)
-validchars chars
- The set of valid characters for the scoring table elements. Defaults to 'abcdefghijklmnopqrstuvwxyz'. Make sure to shell-escape any questionable characters such as '*' and '?'.
EXAMPLES
genscores -type digramlog -output myDigramTable.tcl frank14.txt
- Generate and save a sum-of-logs-of-digram-frequencies scoring table based on the standard Frankenstein text.
genscores -type ngramcount -verbose -elemsize 5 -output my5gramTable.tcl file1.txt file2.txt file3.txt file4.txt
- Generate and save a 5-gram frequencies based on the sum of 4 input files. Extra status information is printed while the program runs.
genscores -type ngramlog -elemsize 4 -output my4gramTable.tcl -validchars "abcdefghijklmnopqrstuvwxyz " file1.txt
- Generate and save a 4-gram frequency scoring table that includes word boundaries.
Back to the Index
wart@kobold.org
Created on Wed Mar 31 08:18:24 PST 2004