Hypercube neuron Abstract. Wojciech Fialkiewicz, Wroclaw, Poland.

Transcription

Hypercube neuron Abstract. Wojciech Fialkiewicz, Wroclaw, Poland.
Hypercube neuron
Wojciech Fialkiewicz, Wroclaw, Poland.
emails : [email protected], [email protected]
Abstract.
Classic perceptron is limited to classifying only data which is linearly separable. In this work it
is proposed neuron that does not have that limitation.
1. Introduction.
In [1] was introduced neuron based on hypercube architecture, in this paper it is presented
enhancement of that neuron.
Some may argue that using hypercube to represent neuron leads to overcomplicated design,
but it is known fact that artificial neural networks with many hidden layers are hard to train with
back propagation algorithm. It may be the case that creating more complex neurons will allow to
create neural networks with less hidden layers but with the same informational capacity.
2. Logic functions.
Logic function can be described by its pattern by enumerating values for all possible sets of
arguments . Pattern of logic function XOR is as follows : (false, false; false), (false, true; true), (true,
false; true), (true, true; false).
Pattern of logic function can be described with use of hypercube architecture. For three
dimensional function described by pattern :
Pattern P = (false, false, false; false),
(false, false, true; true),
(false, true, false; false),
(false, true, true; true),
(true, false, false; false),
(true, false, true; true),
(true, true, false; false),
(true, true, true; true).
There is hypercube that visualizes that logic function shown in Graph 2.1.
110 false
100 false
111 true
101 true
010 false
000 false
011 true
001 true
Graph 2.1 Three dimensional hypercube of logic function described by pattern P.
3. Linearly separable patterns of logic functions.
Classic perceptron can only be trained data sets that are linearly separable. Table (3.1)
shows amount of logic functions with linearly separable patterns for given arguments amount (data
from [2]), as well as total amount of logic functions for the same amount of arguments.
arguments
amount
linearly separable
total functions percent
14
2
16
87,5
104
3
256
40,625
1882
4
65536 2,871704102
94572 4294967296 0,002201926
5
15028134 1,84467E+19 8,14677E-11
6
8378070864 3,40282E+38 2,46209E-27
7
17561539552946 1,15792E+77 1,51664E-62
8
144130531453121000 1,3408E+154 1,075E-135
9
Table 3.1 Amount of logic functions for given arguments amount.
Last column of Table (3.1) shows percent of logic functions that classic perceptron can learn.
In this paper is proposed neuron that can learn any logic function.
4. Neuron with single input.
In neuron input value is x, activation function is f(x), w is weight of its single input.
Neurons exit value is equal to f(w*x).
f(x) is defined as follows :
4.1 f(x) =
bsgm(x) is defined as follows :
4.2 bsgm(x) =
, where B = 5,625
So At and Ab are respectively top and bottom asymptote of function f(x).
1,5
1
0,5
0
-3
-2
-1
-0,5
0
1
2
3
-1
-1,5
Graph of f(x) for At = 1.0 and Ab = -1.0.
Neuron with single input has three parameters that will be changing during learning process,
they are At, Ab and w.
5. Neuron with two inputs.
In neuron with two inputs inputs are defined as x and z, weights on inputs as wx and wz,
activation function f(x,z) is defined with use of four asymptote points A00, A01, A10 and A11 .
Graph 5.1 shows neurons activation function skeleton for A00= 0, A01= 0, A10= 0, A11= 1.
y
z
A11
B1
A10
f(x0, z0)
B0
x
A00
A01
Graph 5.1 – sample skeleton of two dimensional function.
Function f(x,z) is defined with use of four f(x) (4.1) functions with At and Ab respectively
equal to (A01 ,A00), (A11, A01), (A11, A10) and (A10, A00).
Value of function f(x,z) in point (x0 ,z0) is calculated in following way :
Firstly is reduced dimension z by calculating B0 and B1 asymptote points defining one dimensional f(x)
function with x argument. Secondly this function is used to obtain f(x0) value equal to f(x0 ,z0).
B0 =
B1 =
f(x0) =
f(x0 ,z0) = f(x0) =
Neuron with two inputs has six parameters that will be changing during learning process,
they are A00, A01, A10, A11 , wx and wz.
6. Neuron with N inputs.
Neuron with N inputs has N weights w1, w2, .., wn on its inputs. Its activation function is
defined by
asymptote points.
Calculating exit value of neuron with N inputs involves reducing in each step of algorithm one
dimension of hypercube by calculating asymptote points for hypercube with lower dimensions.
Following algorithm can be used to calculate exit value of neuron with N inputs :
(6.1) double ExitValue(double *Weights, double *Inputs, double *Points, int N)
{
int iPairsAmountANDStepSize;
iPairsAmountANDStepSize = 2^(N-1);
for(int iDim=N-1;iDim>=0;iDim--)
{
for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++)
{
Points[iPair]=FuncBSGM(Points[iPair],
Points[iPair+iPairsAmountANDStepSize],
Weights[iDim]*Inputs[iDim]);
}
iPairsAmountANDStepSize /= 2;
}
return Points[0];
}
Where Weights is array of weights on neuron inputs, Inputs is array of input values that
neuron will process, Points is array of asymptote points defining neuron activation function and
FuncBSGM is function that calculates (4.1) function and takes Ab and At as parameter.
Exit value of neuron with N inputs can be specified as formula :
(6.2) f(x1, x2, .., xn) =
(A0 * Fun(0, w0, x0, 0) * Fun(1, w1, x1, 0)* ... * Fun(N, wN, xN, 0) + A1 * Fun(0, w0,
x0, 1) * Fun(1, w1, x1, 1)* ... * Fun(N, wN, xN, 1) + ... + A
Fun(N, wN, xN, ) )
* Fun(0, w0, x0,
) * Fun(1, w1, x1,
)* ... *
Where x0, x1, .., xN are neuron inputs values A0, A1 ... A are asymptote points defining
neurons activation function and Fun(d, w, x, p) is defined as follows:
(6.3) Fun(d, w, x, p) =
7. Supervised learning of neuron.
To change parameters defining neuron during learning process it is used gradient based
algorithm. In this algorithm neurons error function is being minimized by modifying value of each of
neuron parameter by its gradient, multiplied by learning parameter.
Learning change for each parameter of neuron is defined as follows :
(7.1) P = P – l*
Where P is neuron parameter that is being modified, l is learning parameter and E() is error
function.
(7.2) E() =
, where y- exit value of neuron, d – desired exit value of neuron
Calculating gradient of any asymptote point is done with use of formula (6.2).
Gradient for asymptote point Am is as follows :
* 2 * (y - d) * Fun(0, w0, x0, m) * Fun(1, w1, x1, m)* ... * Fun(N, wN, xN, m)
Where N is number of neuron inputs , Fun () is function (6.3), w0, w1, ..wn are neuron weights
and x0, x1, ...,xn are neuron inputs.
Calculating gradients of weights is more complicated it requires modifying algorithm (6.1), to
leave dimension of weight for which gradient is being calculated to be the last one to reduce.
Following algorithm shows how it is accomplished :
(7.3) void LowerDimsToDim(int iDimNr, double *Weights, double *Inputs, double *Points, int N)
{
int iPairsAmountANDStepSize;
int iSecondPairsStart;
iPairsAmountANDStepSize = 2^(N-1);
for(int iDim = N-1;iDim>=0;iDim--)
{
if(iDim!=iDimNr)
{
for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++)
{
Points[iPair]=FuncBSGM(Points[iPair],
Points[iPair+iPairsAmountANDStepSize],
Weights[iDim] * Inputs[ iDim]);
}
if(iDim<iDimNr)
{
for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++)
{
Points[iSecondPairsStart + iPair]=FuncBSGM(Points[iSecondPairsStart + Pair],
Points[iSecondPairsStart +iPair+iPairsAmountANDStepSize],
Weights[iDim] * Inputs[ iDim]);
}
}
}else
{
iSecondPairsStart = iPairsAmountANDStepSize;
}
iPairsAmountANDStepSize /= 2;
}
Points[1] = Points[iSecondPairsStart];
}
When algorithm (7.3) is executed as a result it leaves two values in Points array, which are
respectively Ab and At asymptote in dimension specified as iDimNr. Using those values it is possible
to attain neuron exit value y with use of function (4.1).
(7.4) y =
So the gradient of weight number iDimNr is equal to :
(7.5)
=
*xiDimNr.
8. Future Work.
Hybrid learning systems based on neural networks presented in [3],[4] are combining
imprinting symbolic knowledge into network structure and training that networks with back
propagation algorithm with unclassified data. Neurons used in mentioned approaches to remember
symbolic rules are realizing AND and OR logic functions. Neuron presented in this paper has ability to
realize any logic function. Nets using that neurons can also be trained with back propagation
algorithm to refine or discover new rules in data.
References.
[1] Fialkiewicz, Wojciech (2002). Reasoning with neural networks. M.Sc Thesis. Wroclaw University of
Technology.
[2] Gruzling, Nicolle (2006). Linear separability of the vertices of an n-dimensional hypercube. M.Sc
Thesis. University of Northern British Columbia.
[3] Geoffrey G. Towell , Jude W. Shavlik (1994). Knowledge-Based Artificial Neural Networks.
[4] Artur S. d'Avila Garcez , Luís C. Lamb , Dov M. Gabbay. A Connectionist Inductive Learning
System for Modal Logic Programming.
Appendix.
Implementation of hypercube neuron in C++.
File Math.h
class CMath
{
static const double dB;
static bool Activate(void);
static bool m_bStart;
public:
static double BipolarSigmoid(double);
static double DerivatBipolarSigmoid(double);
static double Random(double,double);//min, max
static int PowerOfTwo(int);
static double FunctionWithAsymptotes(double,double,double);//lower,
upper asymptote, x
};
File Math.cpp
#include
#include
#include
#include
"Math.h"
"stdlib.h"
"time.h"
<math.h>
bool CMath::m_bStart = CMath::Activate();
const double CMath::dB = 5.625;
bool CMath::Activate()
{
srand((unsigned)time(NULL));
return true;
}
double CMath::Random(double dMin, double dMax)
{
return (double)rand()/(RAND_MAX + 1)*(dMax - dMin) + dMin;
}
int CMath::PowerOfTwo(int nPower)
{
int nResult = 1;
while(nPower)
{
nResult *= 2;
nPower--;
}
return nResult;
}
double CMath::BipolarSigmoid(double dX)
{
return 2.0/(1+exp(-dB*dX))-1.0;
}
double CMath::FunctionWithAsymptotes(double dBottom, double dTop, double dX)
{
return ((dTop - dBottom)/2.0) * BipolarSigmoid(dX)+((dTop + dBottom)/2.0);
}
double CMath::DerivatBipolarSigmoid(double dX)
{
return (2*dB*exp(-dB*dX))/((1+exp(-dB*dX))*(1+exp(-dB*dX)));
}
File Neuron.h
class CNeuron
{
double *m_dWeights;
double *m_dAsympPoints;
int m_nInputsAmount;
int m_nPointsAmount;
double *m_dTmp;
double *m_dTmpWeights;
static void LowerDimsToDim(int,double*,double*,double*,int);//dimension nr, weights,
inputs, asymptote points, inputs amount
static void CopyBuffer(double*,double*,int);//src, dest, amount
static bool CheckBit(int,int);//value, bit nr
static void
CalculateAsympPointsGradients(double*,double*,double*,double*,int,int);//buffor for gradients,
asymp points, weights, inputs, inputs am, points am
public:
CNeuron(int);//inputs amount
~CNeuron(void);
double ExitValue(double*);//inputs
double ExitValueFromFormula(double*);//inputs
double Lern(double*,double,double);//inputs, desired output, lerning parameter
void AdaptNeuronByGradient(double*,double,double);//inputs, gradinet fo neuron, lerning
parameter
void GradientsOnInputs(double*,double*,double);//inputs, (out) gradients on inputs,
gradient for neuron
void RememberRule(char*,double);//coordinates of hypercube vertex example : "001" (false, false, true), value on the vertex
void OnesOnWeights();//assign 1.0 to all weights, must be called when using
RememberRule
private:
CNeuron(const CNeuron&);
CNeuron& operator=(const CNeuron&);
};
File Neuron.cpp
#include "Neuron.h"
#include "Math.h"
CNeuron::CNeuron(int iInputsAm)
{
m_nPointsAmount = CMath::PowerOfTwo(iInputsAm);
m_nInputsAmount = iInputsAm;
m_dAsympPoints = new double[m_nPointsAmount];
m_dTmp = new double[m_nPointsAmount];
m_dTmpWeights = new double[iInputsAm];
m_dWeights = new double[iInputsAm];
for(int i=0;i<m_nInputsAmount;i++)
{
m_dWeights[i] = CMath::Random(-1.0, 1.0);
}
for(int i=0;i<m_nPointsAmount;i++)
{
m_dAsympPoints[i] = CMath::Random(-1.0, 1.0);
}
}
CNeuron::~CNeuron(void)
{
if(m_dAsympPoints)
{
delete [] m_dAsympPoints;
}
if(m_dTmp)
{
delete [] m_dTmp;
}
if(m_dTmpWeights)
{
delete [] m_dTmpWeights;
}
if(m_dWeights)
{
delete [] m_dWeights;
}
}
void CNeuron::LowerDimsToDim(int nDimNr, double *dWeights, double *dInputs, double *dPoints,
int dInAm)
{
int nPairsAmountANDStepSize;
int nSecondPairsStart;
nPairsAmountANDStepSize = CMath::PowerOfTwo(dInAm - 1);
for(int nDim = dInAm - 1;nDim >= 0 ; nDim--)
{
if(nDim!=nDimNr)
{
for(int nPair = 0; nPair<nPairsAmountANDStepSize;nPair++)
{
dPoints[nPair] = CMath::FunctionWithAsymptotes(dPoints[nPair],
dPoints[nPair+nPairsAmountANDStepSize], dWeights[nDim] * dInputs[nDim]);
}
if(nDim < nDimNr)
{
for(int nPair = 0;nPair<nPairsAmountANDStepSize;nPair++)
{
dPoints[nSecondPairsStart+nPair] =
CMath::FunctionWithAsymptotes(dPoints[nSecondPairsStart+nPair],
dPoints[nSecondPairsStart+nPair+nPairsAmountANDStepSize], dWeights[nDim]*dInputs[nDim]);
}
}
}
else
{
nSecondPairsStart = nPairsAmountANDStepSize;
}
nPairsAmountANDStepSize /= 2;
}
dPoints[1] = dPoints[nSecondPairsStart];
}
double CNeuron::ExitValue(double *dInputs)
{
CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount);
LowerDimsToDim(0, m_dWeights, dInputs, m_dTmp, m_nInputsAmount);
return CMath::FunctionWithAsymptotes(m_dTmp[0], m_dTmp[1], dInputs[0]*m_dWeights[0]);
}
void CNeuron::CopyBuffer(double *dSrc, double *dDest, int nAmount)
{
for(int i=0;i<nAmount;i++)
{
dDest[i] = dSrc[i];
}
}
bool CNeuron::CheckBit(int nValue, int nBitNr)
{
if(nValue & (1 << nBitNr))
{
return true;
}
else
{
return false;
}
}
void CNeuron::CalculateAsympPointsGradients(double *dGradients, double *dAsympPoints, double
*dWeights, double *dInputs, int nInputsAm, int nPointsAm)
{
double dFraction = 1.0 / CMath::PowerOfTwo(nInputsAm);
for(int i=0;i<nPointsAm;i++)
{
dGradients[i] = dFraction;
for(int j=0;j<nInputsAm;j++)
{
if(CheckBit(i, j))
{
dGradients[i] *= (1.0 + CMath::BipolarSigmoid(dWeights[j] *
dInputs[j]));
}
else
{
dGradients[i] *= (1.0 - CMath::BipolarSigmoid(dWeights[j] *
dInputs[j]));
}
}
}
}
double CNeuron::ExitValueFromFormula(double *dInputs)
{
double dExitValue = 0.0;
CalculateAsympPointsGradients(m_dTmp, m_dAsympPoints, m_dWeights, dInputs,
m_nInputsAmount, m_nPointsAmount);
for(int i=0;i<m_nPointsAmount;i++)
{
dExitValue += m_dAsympPoints[i] * m_dTmp[i];
}
return dExitValue;
}
void CNeuron::AdaptNeuronByGradient(double *dInputs, double dGradient, double dLerningParam)
{
for(int i=0;i<m_nInputsAmount;i++)
{
CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount);
LowerDimsToDim(i, m_dWeights, dInputs, m_dTmp, m_nInputsAmount);
m_dTmpWeights[i] = m_dWeights[i] - dLerningParam * dGradient * ((m_dTmp[1] m_dTmp[0])/2.0) * CMath::DerivatBipolarSigmoid(m_dWeights[i] * dInputs[i]) * dInputs[i];
}
CalculateAsympPointsGradients(m_dTmp, m_dAsympPoints, m_dWeights, dInputs,
m_nInputsAmount, m_nPointsAmount);
for(int i=0;i<m_nPointsAmount;i++)
{
m_dAsympPoints[i] -= dLerningParam * dGradient * m_dTmp[i];
}
CopyBuffer(m_dTmpWeights, m_dWeights, m_nInputsAmount);
}
void CNeuron::RememberRule(char *cCoordinates, double dValue)
{
int nPointPosition = 0;
for(int i=m_nInputsAmount - 1;i>=0;i--)
{
if(cCoordinates[i] != '0')
{
nPointPosition += CMath::PowerOfTwo(m_nInputsAmount - i - 1);
}
}
m_dAsympPoints[nPointPosition] = dValue;
}
void CNeuron::OnesOnWeights()
{
for(int i=0;i<m_nInputsAmount;i++)
{
m_dWeights[i] = 1.0;
}
}
double CNeuron::Lern(double *dInputs, double dDesiredOutput, double dLernParam)
{
double dExit = ExitValue(dInputs);
AdaptNeuronByGradient(dInputs, 2.0 * (dExit - dDesiredOutput), dLernParam);
return (dExit - dDesiredOutput) * (dExit - dDesiredOutput);
}
void CNeuron::GradientsOnInputs(double *dInputs, double *dGradients, double dGradient)
{
for(int i=0;i<m_nInputsAmount;i++)
{
CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount);
LowerDimsToDim(i, m_dWeights, dInputs, m_dTmp, m_nInputsAmount);
dGradients[i] = dGradient * ((m_dTmp[1] - m_dTmp[0])/2.0) *
CMath::DerivatBipolarSigmoid(m_dWeights[i] * dInputs[i]) * m_dWeights[i];
}
}