We feed the properties we collected from the dataset using out Analyser in the Classifier to find suitable parameters for configuring CompleteSearch.
The problem to solve is essentially a classification problem, in which each column of the input file is assigned to the different CompleteSearch parameter classes.
For further details take a look at chapter 3 in thesis
Output
The Classifier makes suggestions for the following parameter for each column in the initial input file to configure the CompleteSearch Web Application:
Parameter | Value Range |
---|---|
full-text | {true, false} |
filter | {true, false} |
facets | {true, false} |
allow-multiple-items | {true, false} |
field-format | {0, 1, 2} * |
show | {true, false} |
excerpt | {true, false} |
ordering | {0, 1, 2} ** |
url | {true, false} |
{true, false} | |
label | {true, false} |
* Formats: 0: plain text 1: JSON 2: XML
** Ordering: 0: lexicographical 1: numerical 2: by date
Usage
Usage: ClassifierMain [mode] [parametersarameter]
Available modes:
--classify <inputFile>
classifies a given dataset into the different parameter classes. The input file is not the actually dataset but the JSON output file containing its features returned by the Analyser--train
trains the classifier by performing all steps that can be computed in advance and saving the training data.--benchmark <configuration>
splits off a part of the training set into a test set, trains the classifier on the reduced training set and evaluates the classification results of the test set. Possible configurations: default, no-augmentation, no-prop-merge, no-sep-predetermination
Parameters:
--props <datasetPropDirectory>
Path to directory containing dataset property files for the input datasets in our training set. This parameter is required for training and benchmarking--labels <datasetLabelDirectory>
Path to directory containing dataset label files for the input datasets in our training set. This parameter is required for training and benchmarking--cache <trainingDataCacheDirectory>