Transfer Learning Aurora Image Classification and Magnetic Disturbance Evaluation (TAME)
In our publication we show that transfer learning can be easily applied to all sky images for the purpose of
classifying images, filtering images and predicting magnetic disturbance from auroral images.
In the publication we describe our methods and show the results that we obtained.
Our data is archived on the NIRD Research Data Archive. It is licensed under CC-BY 4.0 and available for at
least 10 years after publication. The archive with accompanying information can be accessed here:
https://doi.org/10.11582/2021.00071
On this website we will
- provide and describe the code we used in our publication for replication of our results.
- provide instructions on how to apply the classifier in 6 lines of code.
- provide a way to conglomerate different kinds of data used in space physics in order to be able to perform large scale data analysis using our code.
1. Replication of Results & Quickstart
For a full documentation of this package see the points below.
- Make sure all dependencies are installed:
sudo apt-get install python3 wget p7zip
- Download our code:
cd <Your working directory here>
wget http://tid.uio.no/TAME/data/code.7z
7z x code.7z
cd code
- Create a conda environment from the environment file we provide. This may take a few moments. Afterwards, activate the environment, to use it. You can find installation instructions for conda here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
conda env create -f environment.yml
conda activate dataHandler
- Create a storage folder, download our fully preprocessed data archives and extract them into the just created folder:
mkdir /home/data
mkdir /home/data/images
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/torch_data.7z -P /home/data
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/other_data.7z -P /home/data
7z x /home/data/torch_data.7z -o/home/data
7z x /home/data/other_data.7z -o/home/data
- Run Python, import our Package and plot the results:
python
from dataHandler import DataAnalyzer
other_data_location = "/home/data/other/"
torch_data_location = "/home/data/torch/"
image_location = "/home/data/images/"
da = DataAnalyzer(analyzer_data_path=other_data_location, torch_path=torch_data_location, image_path=image_location)
da.plot_daily_cloud_coverage()
da.plot_conf_images()
da.plot_cloud_predictions_roc()
da.demonstrate_cluster()
da.plot_magn_predictions()
da.print_segm_ims("removed")
da.print_segm_ims("segmented")
da.plot_umap_preds()
da.get_cloud_examples()
da.print_linear_hyperplots()
da.print_rbf_hyper_map()
This will create the figures we show in our publication and save them in /home/data/images.
2. Usage of the ASIM Classifier
Follow steps 1 to 3 of the tutorial above to set up the necessary environment. Next, download and extract the classifier:
mkdir /home/data
mkdir /home/data/torch
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/oath_clf.7z
7z x oath_clf.7z -o/home/data/torch/
In a python shell you can access the classifier the following way:
import glob
from dataHandler import AsimClassifier
torch_data_location = "/home/data/torch/"
clf = AsimClassifier(torch_path=torch_data_location)
image_file_list = glob.glob("path/to/asim/files/*.png")
results = clf.classify_images(image_file_list)
The default batch size will be 64, if you want to increase it, you can pass the parameter "batch_size" to clf.classify_images()
.
Because the neural network we use employs batch normalization, we recommend not to go below the current default
batch size.
3. Documentation
Retrieval of data
We use all sky imager data that is freely available through UiOs Resources. Please see here if you want to use this data. However, the total file size of all images taken in the timeframe Nov 2010 - Feb 2011 is about 200GB, which is why we kindly ask you to use the preprocessed files that we provide below. If you are interested in the original image files, please let us know, and we will arrange for a transfer. We provide the following archives for preprocessed files:
- asim_data.7z This archive contains the processed all sky
imager data files.
The files' default location is
data/asim
- magnetometer_data is available through the
International Monitor for Auroral Geomagnetic Effects (IMAGE).
Data can be downloaded from their website and preprocessed using our preprocessing library.
The magnetometer data's default location is
data/magn
- ceilometer_data is provided by the Alfred
Wegener Institute through PANGAEA.
Downloaded files can be processed with our preprocessor.
The ceilometer data's default location is
data/ceil
- oath_data is provided by UiO.
After Download and extraction, these files can be used to extract the OATH-images' features and train the
classifier using the preprocessing library.
The OATH data's default location is
data/oath
- oath_clf.7z This archive contains only the classifier
obtained from the OATH images.
The classifier's default location is
data/torch/oath.clf
. - torch_data.7z This archive contains other meta-data
obtained to test the different neural networks for classification.
The files' default location is
data/torch
. - other_data.7z This archive contains the highest level of
processed data.
The files contained herein are at the last processed stage of our code and are what is used to create the
results and figures we present in our publication.
This folder's default location is
data/other
Due to the individual size of the archives, we provide them separately.
mkdir ~/data
mkdir ~/data/torch
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/asim_data.7z -P ~/data
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/oath_clf.7z -P ~/data/torch
wget https://ns9999k.webs.sigma2.no/10.11582_2021.00071/other_data.7z -P ~/data
7z x ~/data/asim_data.7z -o/home/data
7z x ~/data/oath_clf.7z -o/home/data/torch
7z x ~/data/other_data.7z -o/home/data
The preprocessed files are always of the same structure: Data is bundled into hdf-files per month, named YYYYMM.hdf. The hdf-files contain a pandas dataframe, where the first columns contain information about time and place the data was taken, followed by columns containing the data, one row per entry. The format of the first columns is the same for every data type we provide to make cross-referencing between them easier.
Usage
Please see the first three steps under "Replication of Results & Quickstart" for how to set up the code.
Our package is structured in several classes abstracted from the main class DataHandler
.
Preprocessor
The PreProcessor
is processing the downloaded data and converts them into the hdf-files we provide
in the archives above.
This is an example for how to preprocess the all sky imager data and set a custom path for the folder in which
to look for the data and to store the preprocessed data in.
Similar functions and parameters exist for all other data types.
from dataHandler import PreProcessor
asim_path = "own_folder/asim_data/"
pp = PreProcessor(ceil_path=asim_path)
pp.proc_asim(batch_size=64)
Contrary to the other functions that mostly only transcribe the data into easier to read storage in pandas dataframes, when processing the all sky imager data, they are run through a convolutional neural network that extracts each image's features, which are then saved per image. Instead of having to use about 200GB of image data for the 2010/11 season, we have reduced the total size to about 6.2GB.
Furthermore the PreProcessor can evaluate_network_performances
and
evaluate_network_accuracies
to benchmark the performance of different pretrained neural network
architectures against the OATH Images, find the best svm_hyper_parameters
by performing a
gridsearch and finally fit_oath_features
to create a classifier based in the OATH images that is
able to predict_image_proba
of the six classes for any given all sky image, if the features have
been extracted by the same neural network.
In order to extract the features, we provide functions to set_model_and_device
the same way we did
as well as a dataset class that is compatible with pytorch's dataloader.
It can be imported and used as
from dataHandler.datasets import AsimDataSet
from torch.utils.data import DataLoader
file_list = []
index_list = []
data = AsimDataSet(file_list, index_list)
dl = DataLoader(data, shuffle=False, batch_size=64)
for i_batch, sample in enumerate(dl, 0):
pass
Here, file_list
is a list of the image files and index_list
ist a list of unique,
numerical indices used to address these files.
Provider
The Provider
provides data for a given timeframe and location.
This is an example for how to retrieve ceilometer and all sky imager data taken in Ny Ålesund between the 1st
and 4th of December 2010.
from dataHandler import Provider
from datetime import datetime
pr = Provider()
date_start = datetime(2010,12,1)
date_end = datetime(2010,12,4)
location = "NYA"
ceil_data = pr.get_ceil(date_start=date_start, date_end=date_end, location=location)
asim_data = pr.get_asim(date_start=date_start, date_end=date_end, location=location)
Because we want to compare different types of data, we provide utility to combine two sets of data. Here, data from the second set of data is combined into the first set of data, such that the maximum time difference between the combined points is as low as possible. If for a point of data in the first set no point of data in the second set within a timeframe of 86400s can be found, the point is discarded. The column of the second dataframe that is to be merged into the first dataframe has to be provided as an argument of the merging function. Due to the nature of this operation that necessitates comparing sometimes tens of thousands of rows for as many times, this might take a while. Since we only expect data in the way we intended the tool to be used for, this function splits any input data on into manageable daily chunks. This means that around midnight points might be merged, where the nearest point might have been on the next or previous day, but the current day has been chosen. Compared to the amount of data, we judged this to be acceptable considering the speed-up of the merging gained by this.
asim_and_ceil = pr.combine_data_sets(asim_data, ceil_data, "CBH")
Analyzer
The DataAnalyzer
is the class that performs the operations that use the processed data, combines
them and analyzes and presents them.
The same way as described above this can be used to create all the figures that we show in the publication
from dataHandler import DataAnalyzer
other_data_location = "/home/data/other_data"
oath_data_location = "/home/data/oath_data"
image_location = "/home/data/images"
da = DataAnalyzer(analyzer_data_path=other_data_location, oath_path=oath_data_location, image_path=image_location)
da.plot_daily_cloud_coverage()
da.plot_conf_images()
da.plot_cloud_predictions_roc()
da.demonstrate_cluster()
da.plot_magn_predictions()
Questions
If you have any questions or remarks, please send me an e-mail.
References
The data is archived here:
https://doi.org/10.11582/2021.00071
If you have not already done so, please read our publication based on this data:
https://doi.org/10.1029/2021JA029683
If you use our classifier, this library in general or our publication, you can cite us the following way:
@article{https://doi.org/10.1029/2021JA029683,
author = {Sado, P. and Clausen, L. B. N. and Miloch, W. J. and Nickisch, H.},
title = {Transfer Learning Aurora Image Classification and Magnetic Disturbance Evaluation},
journal = {Journal of Geophysical Research: Space Physics},
volume = {127},
number = {1},
pages = {e2021JA029683},
keywords = {aurora, all sky imager, machine learning, auroral imaging},
doi = {https://doi.org/10.1029/2021JA029683},
url = {https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2021JA029683},
eprint = {https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2021JA029683},
note = {e2021JA029683 2021JA029683},
abstract = {Abstract We develop an open source algorithm to apply Transfer learning to Aurora image classification and Magnetic disturbance Evaluation (TAME). For this purpose, we evaluate the performance of 80 pretrained neural networks using the Oslo Auroral THEMIS (OATH) data set of all-sky images, both in terms of runtime and their features' predictive capability. From the features extracted by the best network, we retrain the last neural network layer using the Support Vector Machine (SVM) algorithm to distinguish between the labels “arc,” “diffuse,” “discrete,” “cloud,” “moon” and “clear sky/ no aurora”. This transfer learning approach yields 73\% accuracy in the six classes; if we aggregate the 3 auroral and 3 non-aurora classes, we achieve up to 91\% accuracy. We apply our classifier to a new dataset of 550,000 images and evaluate the classifier based on these previously unseen images. To show the potential usefulness of our feature extractor and classifier, we investigate two test cases: First, we compare our predictions for the “cloudy” images to meteorological data and second we train a linear ridge model to predict perturbations in Earth's locally measured magnetic field. We demonstrate that the classifier can be used as a filter to remove cloudy images from datasets and that the extracted features allow to predict magnetometer measurements. All procedures and algorithms used in this study are publicly available, and the code and classifier are provided, which opens possibility for large scale studies of all-sky images.},
year = {2022}
}
}
Acknowledgements and Copyright
The source code in this library is licensed under a BSD-2-Clause License. Unless stated otherwise, all data contained in the datasets we provide ourselves alongside this publication under the links above are licensed under a Creative Commons Attribution-NonCommercial 4.0 License (CC BY-NC 4.0). The copyright for the all-sky imager data, some data-files are derived from, remains with the original copyright holder, the University of Oslo. Information on how to use the original image-files can be obtained here: http://tid.uio.no/plasma/aurora/usage.html