Disclaimer: this software is very rough - not for the weak hearted.. Installation is rather complicated, usage is rather complicated and I have experienced many crashes. However it is a very comprehensive experience towards creating a proper ensemble library.
Installation
Run ubuntu 11.10 on Intel platform (on Amazon EC2 use image: ami-6743ae0e) connect to the ubuntu instance:ssh -i graphlabkey.pem ubuntu@ec2-184-73-45-88.compute-1.amazonaws.com sudo apt-get update sudo apt-get install build-essential ia32-libs rpm gcc-multilib curl libcurl4-openssl-dev
Download Intel c++ compiler from here:
You should select: Intel® C++ Composer XE 2011 for Linux Includes Intel® C++ Compiler, Intel® Integrated Performance Primitives, Intel® Math Kernel Library, Intel® Parallel Building Blocks
Register using the form, you will get an email with the license number.
tar xvzf l_ccompxe_intel64_2011.10.319.tgz cd l_ccompxe_intel64_2011.10.319 ./install.sh >>select option 2Follow instructions using the default options until completion. Add the following lines to /etc/ld.so.conf:
/opt/intel/composer_xe_2011_sp1.10.319/compiler/lib/intel64/ /opt/intel/composer_xe_2011_sp1.10.319/compiler/mkl/lib/intel64/ /opt/intel/composer_xe_2011_sp1.10.319/compiler/ipp/lib/intel64/
Run the command:
sudo ldconfig
For bash:
source /opt/intel/composer_xe_2011_sp1.10.319/bin/compilervars.sh intel64
Edit Makefile to have:
INTEL_PATH = /opt/intel/composer_xe_2011_sp1.10.319/
And also:
INCLUDE = -I$(INTEL_PATH)/compiler/include -I$(INTEL_PATH)/mkl/include -I$(INTEL_PATH)/ipp/include LIB = -L$(INTEL_PATH)/mkl/lib/intel64/ -L$(INTEL_PATH)/ipp/lib/intel64/ -lmkl_solver_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lippcore -lipps -openmp -lpthread
Now run make. If all went fine you will get an executable named ELF.
Common errors:
1) YahooFinance.h(6): catastrophic error: cannot open source file "curl/curl.h"Solution: install libcurl4-openssl-dev as instructed above.
2) AlgorithmExploration.o InputFeatureSelector.o KernelRidgeRegression.o NeuralNetworkRBMauto.o nnrbm.o Autoencoder.o GBDT.o LogisticRegression.o YahooFinance.o -L/opt/intel/composer_xe_2011_sp1.10.319//mkl/lib/em64t -L/opt/intel/composer_xe_2011_sp1.10.319//ipp/em64t/sharedlib -lmkl_solver_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lippcoreem64t -lippsem64t -openmp -lpthread ld: cannot find -lippcoreem64t ld: cannot find -lippsem64t make: *** [main] Error 1Solution: edit the Makefile as instructed above.
Setting up the software
Prepare you training data in CSV format where the last column is the target. Prepare your test data in CSV format. Create a directory named CSV, and inside it a file named Master.dsc with the following configuration:dataset=CSV isClassificationDataset=1 maxThreads=2 maxThreadsInCross=2 nCrossValidation=6 validationType=Retraining positiveTarget=1.0 negativeTarget=-1.0 randomSeed=124391994 nMixDataset=20 nMixTrainList=100 standardDeviationMin=0.01 blendingRegularization=1e-4 blendingEnableCrossValidation=0 blendingAlgorithm=LinearRegression enablePostNNBlending=0 enableCascadeLearning=0 enableGlobalMeanStdEstimate=0 enableSaveMemory=1 addOutputNoise=0 enablePostBlendClipping=0 enableFeatureSelection=0 featureSelectionWriteBinaryDataset=0 enableGlobalBlendingWeights=0 errorFunction=RMSE disableWriteDscFile=0 enableStaticNormalization=0 #staticMeanNormalization=7.5 #staticStdNormalization=10 enableProbablisticNormalization=0 dimensionalityReduction=no subsampleTrainSet=1.0 subsampleFeatures=1.0 globalTrainingLoops=1 [ALGORITHMS] LinearModel_1.dsc #KNearestNeighbor_1.dsc #NeuralNetwork_1.dsc #KernelRidgeRegression_1.dsc #PolynomialRegression_1.dsc #NeuralNetwork_1.dsc #GBDT_1.dscThen create a LinearModel_1.dsc file with the following configuration:
ALGORITHM=LinearModel ID=1 #TRAIN_ON_FULLPREDICTOR= DISABLE=0 [int] maxTuninigEpochs=10 [double] initMaxSwing=1.0 initReg=0.01 [bool] tuneRigeModifiers=0 enableClipping=0 enableTuneSwing=0 minimzeProbe=0 minimzeProbeClassificationError=0 minimzeBlend=1 minimzeBlendClassificationError=0 [string] weightFile=LinearModel_1_weights.dat fullPrediction=LinearModel_1.dat
Now create a subfolder called CSV/DataFiles, inside it a file called settings.txt with the following:
delimiter=, train=train.csv trainTargetColumn=19 test=test.csvWhere train.csv and test.csv point to your train and test filenames, and trainTargetColumn points to the last column of your data (column numbers start from zero).
Note: train and test should have the same number of columns. If the test does not have labels, then add a column with zeros.
Running ELF
For training do:
ubuntu@domU-12-31-35-00-21-42:~$ ./ELF CSV/ t maxThreads(OPENMP): 4 Scheduler Constructor Data Open master .dsc file:CSV//Master.dsc isClassificationDataset: 1 Set max. threads in MKL and IPP: 2 maxThreads(OPENMP): 2 Train 6-fold cross validation ValidationType: Retraining Set random seed to: 124391994 randomSeed: 124391994 frameworkMode: 0 Start scheduled training Fill data gradientBoostingLoops:1 DatasetReader Read CSV from: CSV//DataFiles #feat:5 Target values: [0]-1 [1]1 descructor DatasetReader reduce training set (current size:6162863) to 100% of its original size [nothing to do] subsample the columns (current:5) to 100% of columns (skip constant 1 features) [nothing to do] subsample the columns (current:5) to 100% of columns (skip constant 1 features) [nothing to do] Randomize the train dataset: 123257260 line swaps [..........] mixInd[0]:467808 mixInd[6162862]:3154542 Enable bagging:0 Set algorithm list (nTrained:0) Load descriptor file: CSV//LinearModel_1.dsc [META] ALGORITHM: LinearModel [META] ID: 1 [META] DISABLE: 0 maxTuninigEpochs: 10 initMaxSwing: 1.0 initReg: 0.01 tuneRigeModifiers: 0 enableClipping: 0 enableTuneSwing: 0 minimzeProbe: 0 minimzeProbeClassificationError: 0 minimzeBlend: 1 minimzeBlendClassificationError: 0 weightFile: LinearModel_1_weights.dat fullPrediction: LinearModel_1.dat Alloc mem for cross validation data sets (doOnlyNormalization:0) Cross-validation settings: 6 sets Calculating mean and std per input f:3lim f:4lim StdMin:0.01 Normalization:[Min|Max mean: -2.72612|-0.940528 Min|Max std: 0.01|0.687338] Features: RawInputs[Min|Max value: -5.7863|0.64705] AfterNormalization[Min|Max value:-4.45221|10.8926] on 5 features Targets: min|max|mean [Nr0:-1|1|0.803235] [Nr1:-1|1|-0.803235] Save mean and std: CSV//TempFiles/normalization.dat.algo1.add0 Random seed:124391994 nFeatures:5 nClass:2 nDomain:1 nTrain:6162863 nValid:0 nTest:0 Make 616286300 index swaps (randomize sample index list)
partition size: 1.02714e+06 slot: TRAIN | PROBE =================== 0: 5135719 | 1027144 1: 5135719 | 1027144 2: 5135719 | 1027144 3: 5135720 | 1027143 4: 5135719 | 1027144 5: 5135719 | 1027144 6: 6162863 | 0 probe sum:6162863 Train algorithm:CSV//LinearModel_1.dsc Load descriptor file: CSV//LinearModel_1.dsc [META] ALGORITHM: LinearModel [META] ID: 1 [META] DISABLE: 0 maxTuninigEpochs: 10 initMaxSwing: 1.0 initReg: 0.01 tuneRigeModifiers: 0 enableClipping: 0 enableTuneSwing: 0 minimzeProbe: 0 minimzeProbeClassificationError: 0 minimzeBlend: 1 minimzeBlendClassificationError: 0 weightFile: LinearModel_1_weights.dat fullPrediction: LinearModel_1.dat AlgoTemplate:CSV//LinearModel_1.dsc Algo:CSV//DscFiles/LinearModel_1.dsc Output File for cout redirect is set now to CSV//DscFiles/LinearModel_1.dsc Floating point precision: 4 Bytes Partition dataset to cross validation sets Can not open effect file:CSV//FullPredictorFiles/ Init residuals Write first 1000 lines of the trainset(Atrain.txt) and targets(AtrainTarget.txt) Apply mean and std correction to train input features Min/Max feature values after apply mean/std: -4.45221/10.8926 Min/Max target: -1/1 Mean target: 0.803235 -0.803235 Constructor Data Algorithm StandardAlgorithm LinearModel Set data pointers Start train StandardAlgorithm Init standard algorithm Read dsc maps (standard values) Constructor BlendStopping Number of predictors for blendStopping: 2 (+1 const, +1 new) Blending regularization: 0.0001 [CalcBlend] lambda:0.0001 [classErr:9.83825%] ERR Blend:0.59568 ============================ START TRAIN (param tuning) ============================= Parameters to tune: [REAL] name:reg initValue:0.01 (min|max. epochs: 0|10) ==================== auto-optimize ==================== (epoch=0) reg=0.01 ...... [classErr:38.0955%] [probe:0.992891] [CalcBlend] lambda:0.0001 [classErr:9.83952%] ERR=0.583664 11[s][saveBest][SB] (epoch=1) reg=0.008 ...... [classErr:38.1632%] [probe:0.992889] [CalcBlend] lambda:0.0001 [classErr:9.83963%] ERR=0.583661 11[s] !min! [saveBest][SB] (epoch=2) reg=0.0064 ...... [classErr:38.2209%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83973%] ERR=0.58366 11[s] !min! [saveBest][SB] accelerate (epoch=3) reg=0.0048422 ...... [classErr:38.2776%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83976%] ERR=0.583661 11[s] (epoch=4) reg=0.008 ...... [classErr:38.1632%] [probe:0.992889] [CalcBlend] lambda:0.0001 [classErr:9.83963%] ERR=0.583661 11[s] (epoch=5) reg=0.00535367 ...... [classErr:38.2585%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83979%] ERR=0.583661 12[s] (epoch=6) reg=0.00738248 ...... [classErr:38.1849%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83968%] ERR=0.583661 11[s] (epoch=7) reg=0.00570903 ...... [classErr:38.2454%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83983%] ERR=0.58366 11[s] (epoch=8) reg=0.00701252 ...... [classErr:38.1978%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83968%] ERR=0.58366 11[s] (epoch=9) reg=0.00594873 ...... [classErr:38.2369%] [probe:0.992888] [CalcBlend] lambda:0.0001 [classErr:9.83983%] ERR=0.58366 11[s] (epoch=10) reg=0.00678554 max. epochs reached. expSearchErrorBest:0.58366 error:0.58366 ============================ END auto-optimize ============================= Calculate FullPrediction (write the prediction of the trainingset with cross validation) Blending weights (row: classes, col: predictors[1.col=const predictor]) 0.799 1.011 -0.799 1.011 Save blending weights: CSV//TempFiles/blendingWeights_02.dat Write full prediction: CSV//FullPredictorFiles/LinearModel_1.dat (RMSE:0.992888) Validation type: Retraining Update model on whole training set Save:CSV//TempFiles/LinearModel_1_weights.dat.006 Calculate retrain RMSE (on trainset) Train of this algorithm (RMSE after retraining): 0.992894 Total retrain time:3[s] =========================================================================== Constructor BlendStopping ADD:CSV//FullPredictorFiles/LinearModel_1.dat Number of predictors for blendStopping: 2 (+1 const) File:CSV//FullPredictorFiles/LinearModel_1.dat RMSE:0.992888 Blending regularization: 0.0001 [CalcBlend] lambda:0.0001 Blending weights (row: classes, col: predictors[1.col=const predictor]) 0.799 1.011 -0.799 1.011 [Write train prediction:CSV//TempFiles/trainPrediction.data] nSamples:6162863 [classErr:9.83973%] Blending weights (row: classes, col: predictors[1.col=const predictor]) 0.799 1.011 -0.799 1.011 Save blending weights: CSV//TempFiles/blendingWeights_02.dat BLEND RMSE OF ACTUAL FULLPREDICTION PATH:0.58366 =========================================================================== destructor BlendStopping delete algo descructor LinearModel descructor StandardAlgorithm destructor BlendStopping descructor Algorithm destructor Data Finished train algorithm:CSV//LinearModel_1.dsc Finished in 275[s] Clear output file for cout Delete internal memory Total training time:399[s] descructor Scheduler destructor Data
No comments:
Post a Comment