香港新世纪文化出版社
地址:香港湾仔卢押道18号海德中心16楼D室
当前位置:首页 >> 国际智能信息与管理科学英文期刊

Improved Random Forest Algorithm for Stream Big Data Processing

Improved Random Forest Algorithm for Stream Big Data Processing 

Jing Li1, Yingchun Liu2

1.College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, China

2. Industrial and Commercial Bank of China, Peony Card Center, Beijing, 100140, China


Abstract: Stream computing is an important form of Big Data computing. Random Forest method is one of the most widely applied classification algorithms at present. From the actual requirements, Random Forest method faces not only huge number of features but also constantly changing data pattern over time. The accuracy of a Random Forest algorithm without self renewal and adaptive algorithm will gradually reduce over time. Aiming at this problem, this paper analyzes the characteristics of Random Forest algorithm, gives a new pruning idea according to the accuracy of the decision trees. In order to adapt to the change of data, a new random method based on margin is presented. This new method can update itself constantly and can be applied in streaming Big Data environments. Using the actual customer data, the new method is verified has higher accuracy in classification.

Keywords: Random forest; Big data; Stream computing