NAME
score - Manipulate plaintext scoring tables.
SYNOPSIS
score option string
DESCRIPTION
score value plaintext ?weight?
- Generates a score for the plaintext based on the default scoring table. If
specified, the result is multiplied by the supplied weight. The initial
default scoring table uses a sum-of-logs-of-digram-frequencies scoring method.
The default scoring table can be changed using the command
score default
command
.
score elemvalue plaintext ?weight?
- Lookup a single element in the default scoring table. If specified, the
result is multiplied by the supplied weight. The initial default scoring table
uses a sum-of-logs-of-digram-frequencies scoring method. The default scoring
table can be changed using the command
score default command
.
score types
- Returns the list of builtin scoring types. These are the only valid types
that can be used with the
score create
command. This list of
types includes:
- digramlog - Sum of the natural logarithms of digram frequencies.
- digramcount - Raw digram frequency counts.
- trigramlog - Sum of the natural logarithms of trigram frequencies.
- trigramcount - Raw trigram frequency counts
- ngramlog - Sum of the natural logarithms of arbitrary n-grams. The size of
the ngrams must be set using the scoring table's
elemsize
command.
- ngramcount - Raw n-gram frequency counts. The size of the ngrams must be
set using the scoring table's
elemsize
command.
- wordtree - The square of the lengths of valid words longer than 2
characters. This scoring table does not have a fixed element size.
score create type
- Create a new scoring table from one of the builtin types. The value of the
type
argument must be one of the types returned by score
types
. This command returns the name of a new scoring object. This new
scoring object is also a Tcl command that is used to populate the scoring table
and retrieve values from the table. Scoring object commands are described
below.
score default ?command?
- Get the name of the command that implements the default scoring table, or set
the command that should be used as the default scoring table. The value for
the
command
argument can be either a scoring object that was
returned by score create
, or a Tcl procedure that implements the
scoring object command. See below for details on scoring object commands.
score isinternal command
- Returns a boolean value indicating if this command was created by
score create
.
SCORING OBJECTS
Each scoring object represents a single scoring table. By using independent
scoring objects, many scoring tables can be created and used simultaneously.
Scoring tables can be created using one of the builtin types {see score
create
}, or using a custom Tcl procedure. If a custom Tcl procedure is
used, it must use the following signature:
proc myScoreProc {subcommand args} {
...
}
The list of possible values for the subcommand is given below.
scoreObj type
- Returns the type of the scoring table. This will be either one of the builtin
types, or a custom type if a Tcl procedure is used.
scoreObj value plaintext ?weight?
- Generates a score for the plaintext based on the command's scoring table. If
specified, the result is multiplied by the supplied weight.
scoreObj elemvalue plaintext ?weight?
- Lookup a single element in the command's scoring table. If specified, the
result is multiplied by the supplied weight.
scoreObj elemsize ?value?
- Set the element size for this scoring table. It is not possible to change the
element size once it is set. It is not possible to change the element size for
the builtin di/tri-gram and wordtree scoring tables. Only the builtin ngram
and custom scoring tables can set an element size. If no size is specified
then this command will return the current element size. An element size of -1
indicates that the element size has not been set. 0 indicates that the element
sizes are not fixed, as is the case with the wordtree type.
scoreObj add element ?value?
- Add a single element to the scoring table. If a value is not specified then
1.0 is used. If the element already exists in this scoring table then the
indicated value is added to the existing table entry.
scoreObj normalize
- Normalize the scoring table. Note that the normalize sub-command is a
misnomer. normalize merely applies some calculation to all entries of
the scoring table once the table has been filled. For the di/tri/n-gramlog
tables, the normalize method takes the natural logarithm of all entries in the
table. For the di/gri/n-gramcount tables, the normalize method does nothing.
Be careful not to call normzlize multiple times as it will apply the
normalization method every time it is called. Also be careful not to add additional elements to a scoring table that has been normalized.
scoreObj dump commandPrefix
- Dump all elements of the scoring table. The commandPrefix is called for every element in the table. The element and element value are appended as a two-item list to the commandPrefix before it is invoked. The following example prints the entire scoring table to stdout:
$scoreObj dump puts
EXAMPLES
% score value "my dog has fleas"
1302.0
- Use the default sum-of-digram-logs on a string of plaintext.
% score create digramlog
score1
% score1 add my 2
my
% score add do 2
do
% score1 normalize
% score1 value "my dog has fleas"
1.38629436112
- Create a new sum-of-digram-logs scoring table based on a
custom frequency table. Note that the
normalize
sub-command is a misnomer. In this case, it merely
computes the log of every value that was added by score
add
. This allows you to enter the raw frequency counts
and let the score command calculate the logs for you.
% score create digramlog
score1
% score1 add my 0.693
my
% score add do 0.693
do
% score1 normalize
% score1 value "my dog has fleas"
1.386
- Create a new sum-of-digram-logs scoring table based on a
custom frequency table. In this example the input digram
values have already been converted to log values, so the
normalize sub-command is not used.
% score create digramcount
score1
% score1 add my 2
my
% score add do 2
do
% score1 normalize
% score1 value "my dog has fleas"
4.0
- The normalize sub-command for the sum-of-frequency-counts
scoring table does nothing. This table stores only the
raw frequency counts.
% score create wordtree
score1
% score1 add my
my
% score1 add dog
dog
% score1 add has
has
% score1 add fleas
fleas
% score1 value "my dog has fleas"
43.0
- The wordtree scoring table calculates scores based on
the square of the lengths of valid words in the plaintext.
1- and 2-letter words are ignored. Again, normalization
is not needed here.
% score create wordtree
score1
% score1 add dog
dog
% score default score1
score1
% score value "my dog has fleas"
9.0
- Change the default scoring method to a custom wordtree
table. Note that we use the
score
command to get the
value here instead of calling the new score1 command.
The score default score1 command associates score1
as the default scoring method.
% score default myScoringMethod
myScoringMethod
% score value "my dog has fleas"
16
- Change the default scoring method to the new custom
scoring method above.
Back to the Index
wart@kobold.org
Created on Wed Mar 31 08:18:25 PST 2004