연구보고서
The Development of Artificial Intelligence Base Technologies for Objective Climate Predictions (I)
- 저자
- Dr. WonMoo Kim, Dr. Kyungwon Park, Dr. Yun-Young Lee, Dr. Jinyoung Rhee, Dr. Uran Chung
- 작성일
- 2022.12.28
- 조회
- 296
- 요약
- 목차
Executive Summary
The purpose of this study is to develop objective and climate prediction-tailored AI base technologies through the identification of data characteristics for seasonal to subseasonal climate predictions, the expansion of available data, the development of a data service system, and the optimization of deep learning model architectures by applying various pre-processing methods and semi-supervised learning.
We applied ConvLSTM and U-Net deep learning techniques to evaluate their prediction performance on weekly forecast days by additionally using satellite data in order to alleviate insufficient data issues. The ConvLSTM did not show optimized results while the U-Net deep learning model provided optimized learning results even with a small number of training data as a result of learning expanded data and showed significant accuracy improvements in terms of PCC and RMSE. We also developed a data service system and developed a GUI environment for utilizing user-oriented deep learning model training sets. This system will contribute to the easy production of input data that can be easily used for artificial intelligence models.
For the development of deep learning-based post-processing technology, this study was first conducted to identify a pre-processing technique suitable for improving the prediction performance of S2S predicted daily maximum and minimum air temperature and daily total precipitation, especially week 2 to 4. The S2S prediction raw data of six individual climate models were converted into training data to be input to the deep learning models, and MME-based S2S training data were additionally constructed from them. The pre-processing process of this study applies five scalers, and added to the pipeline consisting of cases that do not apply to these scalers, and a technique was also added to select the features characteristics of the training data according to the rank by calculating the correlation between the transformed training data and the labels. The learning model of this study simply applied TimeDistributed to the convolutional layer of U-Net, and named this as a reference model. Climate prediction data before correcting the spatial Pattern Correlation Coefficient (PCC) by lead-time evaluated for the results predicted by the Base model for the six individual climate models pre-processed with six techniques and the MME-based daily maximum and minimum air temperature and daily total precipitation training data, that is, as a result of comparison with the observed values, suitable pre-processing techniques improved the prediction performance of daily total precipitation and daily maximum and minimum air temperatures treated with Standard and Robust techniques. In particular, the prediction performance of daily total precipitation was improved in the entire lead-time, and the prediction performance of daily maximum and minimum air temperature was not improved in week 1, but improved in week 3 and 4, and it was confirmed that there is an effect of improving the post-correction of the Base model for the S2S training data transformed into a suitable pre-processing technique. In addition, in the case of daily total precipitation to which the feature selection technique was applied, it was found that the decrease in the dimensional of the training data and the decrease in input variables according to feature selection did not affect the change in prediction performance. However, in the case of the daily maximum and minimum air temperature, it was confirmed that the prediction performance could decrease if the input variables of the training data by feature selection were below the reference(for example, selection of seven variables in training data), and the feature selection method was still effective in improving the prediction performance of week 1 of the daily maximum and minimum air temperature. As it does not appear to contribute, it is thought that a different approach, such as applying a deep learning training model different from precipitation prediction, is needed to improve the prediction performance of the S2S air temperature training data. In the next study, we will explore the reason why the base model's week 1 improvement is insignificant, optimize the hyper-parameters and structures of the base model, and add various training models to find a deep learning model that can improve the predictive performance of the S2S.
Graph- and image-based Artificial Intelligence models were developed and supervised learning results were examined to construct the research environment for the application of semi-supervised learning. In order to alleviate the lack of training data issue, data augmentation methods were tested: the construction of semi-monthly data from daily data, and the cutmix of monthly data. The use of only the target month (Target Month Only) was compared to the month-agnostic approach (Month-Agnostic) using all months. Both node classification and graph classification models were developed as graph-based AI models, and prediction performance of summertime monthly mean temperature were examined. The effect of data augmentation was not obvious since both models were not trained well. Results of the test set for July (LT2) with cutmix data augmentation and the Month-Agnostic approach outperformed other cases for the node classification model. The Month-Agnostic approach performed better for June (LT1) (HSS>0.35 for all folds) and the performance was improved with data augmentation for July (LT2) for the graph classification model. 3-dimensional convolution neural networks model was developed as a image-based AI model. The model with the cutmix data augmentation and the Month-Agnostic approach was better trained and outperformed other cases (Accuracy>0.6 for all months and folds). The prediction performance for the test set was improved for June (LT1) and July (LT2). Spatially significant regions were derived based on the Class Activation Maps of the last convolution layer of the model. Cases with newly activated areas where their climatological significance was previously analyzed were observed. Supervised learning results were examined and the effect of data augmentation methods was investigated in this first year study.