Abstract:
Based on the Algerian forest fire data, through the decision tree algorithm in Spark MLlib, the characteristic parameters with high correlation was put forward to improve the performance of the model and predict forest fires. For the characteristic parameters, such as temperature, wind speed, rain and the main indicators in the Canadian forest fire weather index (FWI) system, in combination with the classification of forest fires, a decision tree based fire prediction model was built using the information gain criterion for Gini's binary decision tree, and the sample data was classified and predicted. The correlation between different feature parameters is analyzed, and the feature parameters with high correlation are eliminated. The machine learning workflow was established by using the big data computing framework Spark, and the Pearson coefficient for calculating the correlation was combined with the decision tree classification algorithm, so as to optimize the model and improve the prediction classification accuracy. Before the improvement of the prediction model, the total accuracy of forest fire prediction classification without correlation analysis was 94.94%. After the improvement of the prediction model, the correlation analysis was carried out, and the characteristic parameter data with high correlation was eliminated. The total accuracy of forest fire prediction classification was 97.17%, and the accuracy rate was improved by nearly 3%. The machine learning algorithm in Spark MLlib had a high accuracy in forest fire prediction and classification, especially after combining various data mining algorithms, the model performance was improved and the prediction and classification accuracy was higher.