  


                PATTERN RECOGNITION CD (PR-CD)


----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------


1. PR-CD CONTENTS


The contents of the PR-CD are as follows:


* Datasets for PR experiments, namely in the book examples and exercises.

* Tools for specific PR procedures usually unavailable in commercial software.



----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------


2. SYSTEM REQUIREMENTS


* PC with CD player, running Windows 98, Windows NT or above, with Microsoft Excel. 


* A minimum of 31 Mbytes should be available in the Hard Disk in order to install all 
  datasets (27.5 Mbytes) and tools (3.5 Mbytes).

* A minimum of 128 Mbytes RAM, 400 MHz processor rate and 800x600 pixels monitor are 
  recommended for acceptable performance. 

* Image datasets are in bitmap and jpeg formats and can be used with any of the many 
  software products handling these formats.


----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------


3. INSTALLATION


* Datasets and Microsoft Excel tools are simply copied from the PR-CD to the destination 
  directory.

* Tools requiring specific installation are provided with a setup.exe file for this purpose. 
  Installation starts by either running the setup.exe file from the Run option of the Start 
  menu or by using the Add/Remove Programs option of the Control Panel option in Settings. 
  Next, simply follow the instructions. Before installation it is advisable to first close 
  all open programs.


----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------


4. USER INSTRUCTIONS


----------------------------------------------------------------------------------------------


4.1 PR Size


The PR Size program has the following modules:


    SC Size (Statistical Classifier Design)


Displays a graphic of the following variables, for a two-class linear classifier with specified 
Battacharrya distance and for several values of the dimensionality ratio (number of patterns per 
class/dimension):


* Bayes error;

* Expected design set error (resubstitution method);

* Expected test set error (holdout method). 


The standard deviations of the error estimates are also displayed when the mouse is clicked over
a selected point of the picture box.


The user must specify:


* Dimension d (<= 10).

* Square of the Battacharrya distance (computable by several statistical software products). 


    NN Size (Neural Network Design)


Displays tables of the following values, for a two-class two-layer MLP and for a user-specified
interval of the number of hidden nodes, h:


* Number of neurons.

* Number of weights (including biases).

* Lower bound of the Vapnik-Chervonenkis dimension.

* Upper bound of the Vapnik-Chervonenkis dimension.

* Lower bound of learning set size needed for generalization.

* Upper bound of learning set size sufficient for generalization.


The user must specify:


* Dimension, d (number of MLP inputs).

* Training set error.

* Confidence level of the training set error. 


----------------------------------------------------------------------------------------------


4.2 Neuro-Genetic


The Neuro-Genetic program allows the user to perform classification of patterns using multilayer
 perceptrons, trained with the back-propagation or with a genetic algorithm.


    Definitions


* Define a new MLP classification project: go to menu Project and select New Project or click 
  the appropriate button in the toolbar.

* Specify the data file. This is a text file with the information organized by rows and columns,
  separated by tabs. Each column corresponds to a network input or output and each row corresponds
  to a different pattern.

* Specify the training and test sets. To specify the input values, the initial and final columns
  and the initial and final rows in the data file should be indicated. For the output values, 
  only the initial and final columns are needed.

* Specify the training procedure (genetic algorithm or back-propagation).

* Specify the neural network architecture (0, 1 or 2 hidden layers). 

* Specify the initial weights. The complete path for the initial weight file must also be filled
  in, or else a file with random weights must be generated (by clicking the appropriate button). 


    Opening and Saving Projects 


* Open a previously saved project with Open Project (menu Project).

* Save a project for later re-use with Save Project (menu Project). 


    Network training 


* The following parameters must be indicated independently of the training technique: error goal;
  maximum number of iterations; number of iterations between chart updates.

* When back-propagation training is chosen, the following values must be indicated: learning rate;
  learning rate increase; learning rate decrease; momentum factor; maximum error ratio.

* When genetic algorithm training is chosen, the following values must be indicated: initial
  population; mutation rate; crossover rate; crossover type.

* The following crossover types can be specified: 1 point crossover; 2 points crossover; uniform
  crossover; NN 1 point crossover; NN 2 points crossover; NN uniform crossover; elitism.

* Training can be started or stopped using the respective buttons. 

* Training stops when the specified error goal or the maximum number of iterations is reached.


    Results


The following training results appear in the Training results frame and are continuously updated
during the learning process:

* Training set error;

* Iteration number;

* Average time for each iteration (epoch);

* Total learning time;

* Test set error;

* Learning rate value (only for back-propagation).


It is also possible to visualize the error evolution during the training procedure by selecting 
the Errors Chart option.
Once the training is complete the user can inspect the weights and the predicted values and 
errors in the training set and test set (View frame). 


    Macros


Neuro-Genetic affords the possibility of creating macros for sequences of projects to be executed
sequentially. The next project in the sequence will be started after the execution of the previous
one is finished. Macros are handled as follows:


* Use New Macro (menu Macro) in order to define a macro.

* Double-click over the line of a column project. A selection box appears for easy insertion of
  the project file name (with extension .prj). 

* Save the macro with Save or Save As.

* Open a macro with Open Macro (menu Macro).


----------------------------------------------------------------------------------------------


4.3 Hopfield


The Hopfield program implements a discrete Hopfield network appropriate for CAM experiments and
for discrete relaxation matching.
 

In order to use the network as a CAM device, proceed as follows:


* Load the prototype patterns with the Load button.

* Memorize the prototype patterns using the Store button. When loading from a file, they are
  immediately stored if the Load and Store option is set. Using the scroll bar, each of the 
  stored prototypes can be inspected.

* Choose Random serial in the combo box for asynchronous updating of the weights. In Full serial
  mode the neurons are updated in sequence from (1,1) to (m, n).

* Draw or load in the grid the unknown binary pattern to be classified. 

* Random noise with uniform distribution can be added to a pattern by clicking the Add Noise
  button. When needed, use the Clear Window button to wipe out the pattern from the grid.

* Use Recall to train the net until the best matching prototype is retrieved. Use Step to inspect
  the successive states until the final state. The Used as a Classifier option should be selected
  before Recall to impose the final selection of the best matching prototype; otherwise the final
  state is displayed.  The weight matrix can be inspected with the Get Weight button.

* A new experiment must be preceded by Clean, wiping out all stored prototype patterns.


In order to use the network for discrete relaxation matching, proceed as follows:


* Dimension the grid with the set cardinalities of the two sets to be matched.

* Fill in the weight matrix using the New Weight button. The weights can be edited either directly
  or loaded in from a file. Only one half of the matrix has to be specified if the Matrix is 
  Symmetric option is selected. In this case, when editing cell (i,j), the cell (j,i) gets the 
  same value.

* When filling in the weight matrix, it is convenient to start by clicking the Weights 
  Initialisation button, which initialises all matrix values with the one specified in the text
  box. The weight matrix can also be cleared using the Clear Weight button.

* Choose the Full parallel mode in the combo box, imposing a synchronous updating of all neurons.

* Click Step to update the assignment probabilities.


----------------------------------------------------------------------------------------------


4.4 KNN


The KNN program allows k-NN classification to be performed on a two-class dataset using either
a partition or an edition approach.


Data file format:


* n number of patterns (n ? 500).
* n1    number of patterns of the first class.
* d dimension (d ? 6).
* ...   n lines with d values, first n1 lines for the first class, followed by n-n1 lines for
    the second class.


A classification experiment proceeds as follows:


* In the Specifications frame the user must fill in the file name, the value of k (number of 
  neighbours) and choose either the partition or the edit method. If the partition method is
  chosen, the number of partitions must also be specified.

* Classification of the data is obtained by clicking the Compute button. The program then shows
  the classification matrix with the class and overall test set errors, in percentage values. 
  For the partition method, the standard deviation of the errors across the partitions is also
  presented.


----------------------------------------------------------------------------------------------


4.5 Perceptron


The Perceptron program has didactical purposes, showing how the training of a linear discriminant
using the perceptron learning rule progresses in a pattern-by-pattern learning fashion for the 
case of separable and non-separable pattern clusters.

The patterns are handwritten u's and v's drawn in an 8x7 grid. Two features computed from these
grids are used. The user can choose either a set of linearly separable patterns (Set 1) or not
(Set 2).

Placing the cursor on each point displays the corresponding u or v.

Learning progresses by clicking the button Step or Enter, in this case allowing fast repetition.


----------------------------------------------------------------------------------------------


4.6 SigParse


The SigParse program allows syntactic analysis experiments of signals to be performed and has
the following main functionalities:


* Linear piecewise approximation of a signal.

* Signal labelling.

* String parsing using a finite-state automaton.


Usually, operation with SigParse proceeds as follows:


* Read in a signal from a text file, where each line is a signal value, up to a maximum of 2000
  signal samples. The signal is displayed in a picture box with scroll, 4x zoom and sample 
  increment (step) facilities. The signal values are also shown in a list box.

* Derive a linear piecewise approximation of the signal. The user specifies the approximation
  norm and a deviation tolerance for the line segments. The piecewise linear approximation is 
  displayed in the picture box with black colour, superimposed on the original signal displayed
  with grey colour. The program also shows the number of line segments obtained and lists the 
  length (number of samples), accumulated length and slope of each line segment in a results list
  box.

* Perform signal labelling by specifying two slope thresholds, s1 and s2. Line segments with 
  absolute slope values below s1 are labelled h (horizontal) and displayed green. Above s1 are
  labelled u (up) or d (down), according to the slope sign (positive or negative) and are 
  displayed with red or cyan colour, respectively. Above s2 are labelled U (large up) or D 
  (large down), according to the slope sign (positive or negative) and are displayed with magenta
  or blue colour, respectively. The labels are shown in the results list box.

* Specify a state transition table of a finite-state automaton, either by directly filling in
  the table or by reading in the table from a text file (the Table option must be checked then),
  where each line corresponds to a table row with the symbols separated by commas. The table has
  a maximum of 50 rows. The letter "F" must be used to designate final states.

* Parse the signal. Line segments corresponding to final states are shown in black colour. 
  State symbols resulting from the parse operation are shown in the results list box.


The contents of the results list box can be saved in a text file. The user can perform parsing
experiments for the same string by modifying the state transition table.

The program also allows the user to parse any string read in with the String option checked. 
The corresponding text file must have a string symbol (character) per line.


--------------------------------------------------------------------------------------------
Porto, FEUP, June 20, 2001.



