Oslo Aurora THEMIS (OATH) Training Dataset

Background

Clausen & Nickisch[1] showed that relatively standard, off-the-shelf machine learning tools can be used to effectively and automatically classify auroral images. On this website you will find information about the tools and the training dataset used, and the code with which you can replicate the results.

In Clausen & Nickisch[1] the following auroral classification was introduced:

LabelExplanationclass6class2
arcThis label is used for images that show one or multiple bands of aurora that stretch across the field-of-view; typically, the arcs have well-defined, sharp edges. 00
diffuseImages that show large patches of aurora, typically with fuzzy edges, are placed in this category. The auroral brightness is on the order of that of stars. 1
discreteThe images show auroral forms with well-defined, sharp edges, that are, however, not arc-like. The auroral brightness is high compared to that of stars. 2
cloudyThe sky in these images is dominated by clouds or the dome of the imager is covered with snow. 31
moonThe image is dominated by light from the Moon. 4
clear/noauroraThis label is attached to images which show a clear sky (stars and planets are clearly visible) without the appearance of aurora. 5

Download

You can download the training dataset together with the Python3 code that trains the ridge classifier here (about 500MB). It is a tar archive (SHA256SUM here) that, once unpacked, creates the following directory structure and files:
oath/
  |
  +- 00_README
  |
  +- classification/
  |   |
  |   +- classification.csv
  |   |
  |   +- train_test_split.csv
  |
  +- code/
  |   |
  |   +- ridge.py
  |   |
  |   +- rotate.sh
  |
  +- features/
  |   |
  |   +- auroral_feat.h5
  |
  +- images/
      |
      +- cropped_scaled/
      |   |
      |   +- 00001.png
      |   |
      |   +- 00002.png
      |   |
      |   +- ...
      |   |
      |   +- 05824.png
      |
      +- files_origin.csv
00_README A text file containing this installation information and lisense information
classification.csv Each line of this file contains information about the image files: numeric class (2 classes), numeric class (6 classes), image index number, label, rotation angle
train_test_split.csv This files contains 5 lines of each 5824 elements. These elements are the randomized index numbers of the images. As can be seen from ridge.py, the contents of the file can be used for the splitting of the annotated dataset into a training and a test dataset. Including the indeces for each dataset makes it easier in the future to compare the preformance of different maschines.
ridge.py Python code that trains a ridge classifier using the feature vectors extracted from all images of the training dataset
rotate.sh A bash script that rotates the original images from the oath/images/cropped_scaled folder and places them in a new folder called oath/images/cropped_scaled_rotated.
auroral_feat.h5 HDF5 file containing the feature vectors for the training dataset
files_origin.csv Each line of this file contains the original source of each image in the training dataset: the THEMIS ASI station abbreviation, the date and time the image was taken, and its file path in the oath directory which contains the image index number (00001, 00002, etc)
00001.png THEMIS ASI image, cropped and scaled

Installation

Here we describe the installation of the necessary components to replicate the training of a ridge classifier based on auroral feature detection as described in Clausen & Nickisch [1]. This installation was tested on a fresh install of Ubuntu 17.10 (Artful Aardvark), Kernel 4.13.0-37 generic, x86_64, running on a laptop with a four-core Intel Core i7-3520-M CPU (2.9 GHz).

In the following examples the OATH tarball was extracted in the user's home directory ~/.

  1. Make sure Python3, git, wget, and imagemagick are installed
    sudo apt install python3 git wget imagemagick
  2. Install several Python3 packages for TensorFlow™
    sudo apt install python3-pip python3-dev python3-h5py python3-contextlib2
  3. Install several Python3 packages for machine learning
    sudo apt install python3-matplotlib python3-pandas python3-sklearn
  4. Install TensorFlow
    mkdir ~/tensorflow/
    cd ~/tensorflow
    pip3 install tensorflow
    git clone https://github.com/tensorflow/models/
    cd models/research/slim
    sudo python3 setup.py install
  5. Download pre-trained Inception model checkpoint
    cd ~/tensorflow
    mkdir checkpoints
    cd checkpoints
    wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
    tar xf inception_v4_2016_09_09.tar.gz
  6. Install TF_FeatureExtraction
    cd ~/tensorflow
    git clone https://github.com/tomrunia/TF_FeatureExtraction

Running the feature detection

This part assumes that the cropped, and scaled images are in the folder ~/oath/images/cropped_scaled (see download above). First, the images are rotated and placed in the folder ~/oath/images/cropped_scaled_rotated. Then, the TF_FeatureExtraction extracts the feature vectors and writes them into a HDF5 file called auroral_feat.h5 in the directory ~/oath/features/. On the laptop mentioned above (four-core Intel Core i7-3520-M) this takes about one hour.
  1. Rotate the images
    cd ~/oath/code
    chmod a+x rotate.sh
    ./rotate.sh
  2. Run feature extraction
    cd ~/tensorflow/TF_FeatureExtraction
    # this is one long command
    python3 example_feat_extract.py --network inception_v4 --checkpoint ../checkpoints/inception_v4.ckpt 
    	--image_path ~/oath/images/cropped_scaled_rotated/ --out_file ~/oath/features/auroral_feat.h5 
    	--layer_names Logits
  3. Train the ridge classifier
    cd ~/oath/code
    python3 ridge.py
  4. This should produce the following output:
    0.8174012593016601 0.010271527444147862
    [[139  27  66   0   0  14]
     [ 37 222  58   1   0  16]
     [ 36  37 335   3   0   3]
     [  0   1   5 224   3   2] 
     [  0   1   3   2 183   1]
     [  7  13   7   1   1 299]]

Comments & questions

Comments and questions can be directed to Lasse Clausen

References

If you use the Oslo Auroral THEMIS dataset, please refer to:

Clausen, L. B. N., & Nickisch, H. (2018). Automatic classification of auroral images from the Oslo Auroral THEMIS (OATH) data set using machine learning. Journal of Geophysical Research: Space Physics, 123, https://doi.org/10.1029/2018JA025274

Acknowledgements & copyright

Unless stated otherwise, all data in the OATH Dataset is licensed under a Creative Commons 4.0 Attribution License (CC BY 4.0) and the accompanying source code is licensed under a BSD-2-Clause License.

In particular, all actual image data included in the tarball are modified from the THEMIS all-sky imagers. We thank H. Frey for giving us permission to include these data. Copyright for these data remains with NASA.

We acknowledge NASA contract NAS5-02099 and V. Angelopoulos for use of data from the THEMIS Mission. Specifically: S. Mende and E. Donovan for use of the ASI data, the CSA for logistical support in fielding and data retrieval from the GBO stations, and NSF for support of GIMNAST through grant AGS-1004736.