ANALYSIS OF ACCURACY IMPROVEMENT IN RANDOM FOREST USING PRINCIPAL COMPONENT ANALYSIS (PCA)



Hanna Willa Dhany(1*), Muhammad Iqbal(2)

(1) Universitas Pembangunan Pancabudi Medan
(2) 
(*) Corresponding Author

Abstract


Decision tree is used to classify a data that still does not know its class to existing classes. The data testing path is the first step that the root node goes through and finally the leaf node will predict the class for the data that has been concluded. Random Forest cannot be relied on for data types that have different categorical variables and therefore needs to be improved in the classification process, this is influenced by differences in the value of the variable. Therefore a method is needed to reduce features that are less relevant to the process of determining accuracy in the classification of the Random Forest method. In research conducted on the PCA + Random Forest classification model, using the Water Quality Status Dataset that has been simplified into 5 attributes, 4 classes and 117 instances with an accuracy rate of 91.43% with a classification error rate of 8.57%. Based on the test results from the four classification models, it can be concluded that the success of the PCA can be used as a reference to improve the accuracy performance of the Random Forest classification model

Keywords


Profile Matching, Decision Support System

Full Text:

PDF

References


[1] Agjee Na’eem Hoosen., Mutanga Onisimo., et al. 2018. The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance. Journal of Spectroscopy.

[2] Chang, C., Wu, Y., Hou, S. 2009. Preparation and Characterization of Superparamagnetic Nanocomposites of Aluminosilicate/Silica/Magnetite, Coll. Surf. A336: 159,166.

[3] Dai Qin-yun,. Zang Chun-Ping., Wu Hao. 2016. Research of Decision tree Classification Algorithm in Data Mining. Dept. of Electric and Electronic Engineering, Shijiazhuang Vocational and Technology Institute. China

[4] Hussain, H., Quazilbash. N.Z., Bai. S. &Khoja, S. 2015. Reduction of Variables for Predicting Breast Cancer Survivability Using Principal Component Analysis.International Conference on Computer-Based Medical Systems, pp. 131-134.

[5] Manasi M. Phadatare, Sushma S. Nandgaonkar. 2014. Uncertain Data Mining usig Decision Tree and Bagging Technique. Department of Computer Engineering, India.

[6] Pal. M. 2007. Random Forest Classifier for Remote Sensing Classification. National Institute of Technology, Department of Civil. Haryana

[7] Patel, B. N., Prajapati G. Satish., Lakhtaria I. Kamaljit. 2012. Efficient Classification of Data Using Decision Tree. Bonfring International Journal of Data Mining, Vol. 2, No.1.

[8] Paul Angshuman, Mukherjee Dipti Prasad, et.al. 2018. Improved Random Forest for Classification. IEEE Transaction on Image Processing Volume: 27, Issue:8

[9] Seema., Rathi Monika., Mamta. 2012. Decision Tree: Data Mining Techniques. Department of Computer Science Engineering. India.

[10] Yang Bo-Suk., Di Xiao., and Han Tian. 2008. Random Forests Classifier for Machine Fault Diagnosis. Journal of Mechanical Science and Technology 22.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Hanna Willa Dhany

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Online ISSN : 2460-5611 | Print ISSN : 1979-9292

Publish by LLDIKTI Wilayah X (Sumatera Barat, Riau, Jambi dan Kepulauan Riau)

Jl. Khatib Sulaiman No 1 Kota Padang. Kode Pos 25144. Telp 0751-7056737. Fax 0751-7056737. Website:http://www.kopertis10.or.id

Web Analytics Made Easy - StatCounter View My Stats

Creative Commons License 

This work is licensed under a Creative Commons Attribution 4.0 International License.