next up previous
Next: Static Files Up: Pattern Classification and Regression Previous: Pattern Classification and Regression

File Formats

Data files store input/output vector pairs which are the data set to be used for classification or approximation. The data file format can be one of multiple types. Currently only two types are supported-- SEQUENCE and STATIC. The STATIC format is for simple input and output vector pairs containing no temporal information. The SEQUENCE format is for time sequences in which groups of time ordered input vectors have a single output vector.

There are several specifications for the data files which are common to both types. All vector component values are considered to be floating point numbers and may not contain extraneous characters. However, everything after a % character on a line is considered to be a comment. While it is generally considered to be convention that each vector pair is on a separate line, this is not required. Incorrectly formatted files may or may not be parsed correctly, so do not depend on any particular behavior for errors in the file.

Each file begins with a header describing the format parameters of that file. Parameters are of the form PARAMETERNAME:PARAMETERVALUE. The order and capitalization of parameters is fixed. After the parameters, the vectors follow, according to the format of the specific file type.





Kui-yu Chang
Wed Mar 12 14:55:12 CST 1997