 -o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o
			FEATURE SELECTION CHALLENGE 
  			Challenge Sample Code in Matlab
        		Isabelle Guyon -- April 2006
 -o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o
 
DISCLAIMER: ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA 
ARE PROVIDED "AS-IS" ISABELLE GUYON AND/OR OTHER ORGANIZERS 
DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT 
OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT 
SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY 
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 
SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE 
AVAILABLE FOR THE CHALLENGE. 

This directory contains sample code for the feature selection
challenge in the Matlab(R) language. The sample code includes
functions to read the data, and format and zip the results.
It contains functions to compute the balanced error rate (BER)
and the area under the ROC curve (AUC). Those functions are 
also provided in C++, and as windows executables (courtesy of
Steve Gunn):
berrate data.labels data.resu
auc data.labels data.resu data.conf

To run the Matlab examples, edit main.m to set the data and result
path properly, then type at the Matlab prompt:
> main;

Here we copied main to main_orig and moved main to the top directory.
====================================================================

We provide examples of baseline methods in the file model_examples.m. 
"baseline": the original methods provided to students in a class taught at ETH.
"optimized": the models optimized by cross-validation by the students.
"noselect": the same models without feature selection. 


The main.m file needs to be edited (to choose the path of data and code, the datasets, methods, etc.) You may want to change:
data_dir					% where the data is
code_dir					% where the CLOP package is
dataset     				% which dataset(s) you are studying
modelset   					% which model (learning machine) to use 
						% type model_examples at the prompt for a list
UsePixelRep					% flag (0/1) to indicate whether to use the raw
						% pixel representation (useful for gisette only,
						% use with model 'pixelGisette_exp_conv'
DoNotLoadTestData				% flag (0/1) not to load test data while training
						% (to alleviate memory problems)
MergeDataSets					% If this flag is zero, training is done on the training data only. 
						% Otherwise training and validation data are merged. 
FoldNum			                       % If this flag is positive, k-fold cross-validation is performed with k=FoldNum.

=======================================================================================
Computing statistics.
data_stats(D); % where D is a data structure D.train, D.valid, D.test.
stats(dat); % where dat is a spider data object

Making plots.

Useful for DOROTHEA
==> Show the sensitivity/specificity/BER as a function of a threshold on the discriminant 
value (useful for unbalanced classes):
fnfp(discriminant{1},D.train.Y); % For training examples
fnfp(discriminant{2},D.valid.Y); % For validation examples
==> Show ROC curves:
roc(discriminant{1},D.train.Y); % For training examples
roc(discriminant{2},D.valid.Y); % For validation examples

Feature visualization.
==> Heat map:
[YS,idx_pat]=sort(D.train.Y); % Sort the patterns to group them by class
%X=get_x(train(standardize, data(D.train.X(idx_pat,idx_feat))));
X=D.train.X(idx_pat,idx_feat);
xmax=max(max(X)); xmin=min(min(X)); YS(find(YS==-1))=xmin; YS(find(YS==1))=xmax; % rescale the targets
rep=ceil(length(idx_feat)/50); % repeat the targets to see them better
cmat_display([X, YS(:,ones(rep,1))]);
==> Scatter plots of the 3 top ranking features
scatterplot(D.train.X(:,idx_feat(1:3)), D.train.Y);

Feature selection.
This assumes the first element of the chain of my_model my_model{1} is a feature selection.
==> Plot the feature weights (descending order)
figure; plot(get_w(my_model{1},1), 'LineWidth',2); xlabel('rank'); ylabel('W');
==> Plot the pvalues(ascending order) -- makes sense only for methods computing pvalues
figure; plot(get_pval(my_model{1},1), 'LineWidth',2); xlabel('rank'); ylabel('pval');
==> Plot the FDR(ascending order) -- makes sense only for methods computing pvalues
figure; plot(get_fdr(my_model{1},1), 'LineWidth',2); xlabel('rank'); ylabel('FDR');

Preprocessing (useful for Gisette).
Example: pixelGisette_exp_conv
==> Show the kernel
prepro=my_model{1};
show(prepro.child);
==> Browse the digits
browse_digit(D.train.X, D.train.Y); % Original representation
prepro=my_model{1};
DD=test(prepro,D.train);
browse_digit(DD.X, D.train.Y); % Preprocessed data










