用微信扫码二维码

分享至好友和朋友圈

WE ARE COMMITTED TO REPORTING THE LATEST FORESTRY ACADEMIC ACHIEVEMENTS

高丰伟, 田睿, 周浩, 等. 基于Spark MLlib中决策树算法对阿尔及利亚森林火灾的预测研究[J]. 四川林业科技, 2023, 44(5): 24−31. DOI: 10.12172/202211150002
引用本文: 高丰伟, 田睿, 周浩, 等. 基于Spark MLlib中决策树算法对阿尔及利亚森林火灾的预测研究[J]. 四川林业科技, 2023, 44(5): 24−31. DOI: 10.12172/202211150002
GAO F W, TIAN R, ZHOU H, et al. Forest fire prediction in Algeria based on decision tree algorithm in Spark MLlib[J]. Journal of Sichuan Forestry Science and Technology, 2023, 44(5): 24−31. DOI: 10.12172/202211150002
Citation: GAO F W, TIAN R, ZHOU H, et al. Forest fire prediction in Algeria based on decision tree algorithm in Spark MLlib[J]. Journal of Sichuan Forestry Science and Technology, 2023, 44(5): 24−31. DOI: 10.12172/202211150002

基于Spark MLlib中决策树算法对阿尔及利亚森林火灾的预测研究

Forest Fire Prediction in Algeria Based on Decision Tree Algorithm in Spark MLlib

  • 摘要: 应用阿尔及利亚森林火灾数据,通过Spark MLlib中的决策树算法,提出过滤相关性高的特征参数提升模型性能,对森林火灾进行预测研究。对温度、风速、雨及加拿大森林火险气候指数(FWI)系统中主要指标等特征参数,结合森林火灾的分类情况,使用信息增益标准为Gini的二叉决策树,建立基于决策树的火灾预测模型,对样本数据进行分类预测;提出分析不同特征参数之间的相关性,剔除相关性高的特征参数,利用大数据计算框架Spark建立机器学习工作流,将计算相关性的皮尔森系数与决策树分类算法结合了起来,从而优化模型,提高预测分类精度。预测模型改进前,即未进行相关性分析的森林火灾预测分类总精度为94.94%;预测模型改进后,即进行相关性分析,剔除了相关性较高的特征参数数据,森林火灾预测分类总精度为97.17%,准确率提高了近3%。使用Spark MLlib中的机器学习算法在森林火灾预测分类方面准确率总体较高,尤其在将多种数据挖掘算法结合后,模型性能得到提高,预测分类精度更高。

     

    Abstract: Based on the Algerian forest fire data, through the decision tree algorithm in Spark MLlib, the characteristic parameters with high correlation was put forward to improve the performance of the model and predict forest fires. For the characteristic parameters, such as temperature, wind speed, rain and the main indicators in the Canadian forest fire weather index (FWI) system, in combination with the classification of forest fires, a decision tree based fire prediction model was built using the information gain criterion for Gini's binary decision tree, and the sample data was classified and predicted. The correlation between different feature parameters is analyzed, and the feature parameters with high correlation are eliminated. The machine learning workflow was established by using the big data computing framework Spark, and the Pearson coefficient for calculating the correlation was combined with the decision tree classification algorithm, so as to optimize the model and improve the prediction classification accuracy. Before the improvement of the prediction model, the total accuracy of forest fire prediction classification without correlation analysis was 94.94%. After the improvement of the prediction model, the correlation analysis was carried out, and the characteristic parameter data with high correlation was eliminated. The total accuracy of forest fire prediction classification was 97.17%, and the accuracy rate was improved by nearly 3%. The machine learning algorithm in Spark MLlib had a high accuracy in forest fire prediction and classification, especially after combining various data mining algorithms, the model performance was improved and the prediction and classification accuracy was higher.

     

/

返回文章
返回