Introduction

This example utilizes neural network approach for classification or regression. Neural networks are common used tools, used in many scientific areas such as pattern recognition, signal processing, astronomy, solution of ordinary and partial differential equations etc. A neural network is a parametric function Nx,w where x is the input vector and w is the vector of the parameters to be calculated (weight vector) The adaptation of the weight vector is implemented by minimizing error function:

ENx,w=Nxi,w-yi2i=1N

In the above equation the set xi,yi, i=1,,N is the data used to train the neural network (train data). The symbol yi stands for the the output of the function estimated at the point xi. The performs the following steps:

1. Read the train data.

2. Minimize the error function using the OnePC parallel genetic environment

3. Apply the local search procedure Tolmin to the final outcome of the OnePC

4. Apply the neural network from the previous step to some unknown data of the problem (called test data)

5. Report the error value on the test data.

The neural network used here conforms to the following equation:

Nx,w=i=1Hw(d+2)i-(d+1)σj=1dxjw(d+2)i-(d+1)+j+w(d+2)i

where H is the total number of processing units (hidden nodes) of the neural network and and d stands for the dimension. The sigmoid function σx is defined as:

σ(x)=11+exp(-x)

The integer number D denotes the dimension of the objective problem and the integer number N denotes the number of input patterns.

Parameters

The parameters for the method init() are:

1. trainfile. This string parameter determines the input train file to be used.

2. testfile. This string parameter determines the input train file to be used.

3. weights. This integer parameter stands for the number of weights used in the neural network. The default value is 3.

4.leftmargin. The left bound for the weight parameters. The default value is -10.

5.rightmargin. The right bound for the weight parameters. The default value is 10.

6.outputfile. The file where the final information from the program will be stored. The file outputfile is opend for appendig and the following information is stored:testfile weights train_error test_error classification_errorwhere test_error is the error function estimated on the test data and classifcation error is the classification error estimated on the test data.

7.tolminiters. The maximum number of iters for Tolmin procedure. The default value is 2001

Experimental results

The proposed method was tested on a series of classification and regression datasets as well as and a series of real world problems. All the experiments were conducted 10 times with different random seeds for the random generators each time and averages were taken. The datasets were splitted into two parts with the same size: train data and test data. For classification datasets the average classification error on the test set is reported and for regression datasets the average mean squared error on the test set is listed. The classification datasets were found in the Machine Learning Repository The results from the application of the proposed method to a series of datasets are listed in the following table. The column C2 stands for two clients, the column C4 denotes results from four clients and finally the column C8 represents results from 8 clients. The description of the classification datasets has as follows:

1. Wine: The wine recognition dataset (WINE) contains data from wine chemical analysis. It contains 178 examples of 13 features each that are classified into three classes.

2. EEG. An EEG dataset, which is available online and includes recordings for both healthy and epileptic subjects, is used. The dataset includes five subsets (denoted as Z, O, N, F, and S) each containing 100 single-channel EEG segments, each one having 23.6-second duration

3. Parkinsons dataset: This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.

4. Transfusion dataset: Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The dataset contains 748 patterns with 4 features.

5. Wdbc dataset: The Wisconsin diagnostic breast cancer dataset (WDBC) contains data for breast tumors. It contains 569 training examples of 30 features each that are classified into two categories.

The regression datasets are available from the Statlib. The description of these datasets has as follows:

1. Housing dataset. This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

2. BK dataset. This dataset comes from Smoothing Methods in Statistics and it used to estimate the points scored per minute in a basketball game. The dataset has 96 patterns of 4 features each.

3. MB dataset. This dataset is available from Smoothing Methods in Statistics and it has 61 patterns and the number of features is 2.

4. FA dataset.The FA dataset contains percentage of body fat, age, weight, height, and ten body circumference measurements. The goal is to fit body fat to the other measurements. The number of the features is 18 and the total number of patterns is 252.

5. NT dataset. This dataset contains data from that examined whether the true mean body temperature is 98.6 F. The number of patterns is 131 and the number of features 2.

Dataset C2 C4 C8
Wine13.37%11.46%10.34%
EEG26.64%26.00%25.68%
Parkinsons23.40%21.96%22.27%
Transfusion22.41%22.11%21.31%
Wdbc15.07%15.11%15.11%
BK14.9110.408.92
FA5.364.983.61
Housing7595.227511.237399.11
MB13.9810.879.36
NT0.510.520.50

Download software