OATH Training Dataset

Oslo Aurora THEMIS (OATH) Training Dataset

Background

Clausen & Nickisch[1] showed that relatively standard, off-the-shelf machine learning tools can be used to effectively and automatically classify auroral images. On this website you will find information about the tools and the training dataset used, and the code with which you can replicate the results.

In Clausen & Nickisch[1] the following auroral classification was introduced:

Label	Explanation	class6	class2
arc	This label is used for images that show one or multiple bands of aurora that stretch across the field-of-view; typically, the arcs have well-defined, sharp edges.	0	0
diffuse	Images that show large patches of aurora, typically with fuzzy edges, are placed in this category. The auroral brightness is on the order of that of stars.	1
discrete	The images show auroral forms with well-defined, sharp edges, that are, however, not arc-like. The auroral brightness is high compared to that of stars.	2
cloudy	The sky in these images is dominated by clouds or the dome of the imager is covered with snow.	3	1
moon	The image is dominated by light from the Moon.	4
clear/noaurora	This label is attached to images which show a clear sky (stars and planets are clearly visible) without the appearance of aurora.	5

Download

You can download the training dataset together with the Python3 code that trains the ridge classifier here (about 500MB). It is a tar archive (SHA256SUM here) that, once unpacked, creates the following directory structure and files:

oath/
  |
  +- 00_README
  |
  +- classification/
  |   |
  |   +- classification.csv
  |   |
  |   +- train_test_split.csv
  |
  +- code/
  |   |
  |   +- ridge.py
  |   |
  |   +- rotate.sh
  |
  +- features/
  |   |
  |   +- auroral_feat.h5
  |
  +- images/
      |
      +- cropped_scaled/
      |   |
      |   +- 00001.png
      |   |
      |   +- 00002.png
      |   |
      |   +- ...
      |   |
      |   +- 05824.png
      |
      +- files_origin.csv

`00_README`	A text file containing this installation information and lisense information
`classification.csv`	Each line of this file contains information about the image files: numeric class (2 classes), numeric class (6 classes), image index number, label, rotation angle
`train_test_split.csv`	This files contains 5 lines of each 5824 elements. These elements are the randomized index numbers of the images. As can be seen from `ridge.py`, the contents of the file can be used for the splitting of the annotated dataset into a training and a test dataset. Including the indeces for each dataset makes it easier in the future to compare the preformance of different maschines.
`ridge.py`	Python code that trains a ridge classifier using the feature vectors extracted from all images of the training dataset
`rotate.sh`	A bash script that rotates the original images from the `oath/images/cropped_scaled` folder and places them in a new folder called `oath/images/cropped_scaled_rotated`.
`auroral_feat.h5`	HDF5 file containing the feature vectors for the training dataset
`files_origin.csv`	Each line of this file contains the original source of each image in the training dataset: the THEMIS ASI station abbreviation, the date and time the image was taken, and its file path in the `oath` directory which contains the image index number (00001, 00002, etc)
`00001.png`	THEMIS ASI image, cropped and scaled

Installation

Here we describe the installation of the necessary components to replicate the training of a ridge classifier based on auroral feature detection as described in Clausen & Nickisch [1]. This installation was tested on a fresh install of Ubuntu 17.10 (Artful Aardvark), Kernel 4.13.0-37 generic, x86_64, running on a laptop with a four-core Intel Core i7-3520-M CPU (2.9 GHz).

In the following examples the OATH tarball was extracted in the user's home directory ~/.

Make sure Python3, git, wget, and imagemagick are installed
```
sudo apt install python3 git wget imagemagick
```

Install several Python3 packages for TensorFlow™

sudo apt install python3-pip python3-dev python3-h5py python3-contextlib2

Install several Python3 packages for machine learning

sudo apt install python3-matplotlib python3-pandas python3-sklearn

Install TensorFlow™

mkdir ~/tensorflow/
cd ~/tensorflow
pip3 install tensorflow
git clone https://github.com/tensorflow/models/
cd models/research/slim
sudo python3 setup.py install

Download pre-trained Inception model checkpoint

cd ~/tensorflow
mkdir checkpoints
cd checkpoints
wget http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
tar xf inception_v4_2016_09_09.tar.gz

Install TF_FeatureExtraction

cd ~/tensorflow
git clone https://github.com/tomrunia/TF_FeatureExtraction

Running the feature detection

This part assumes that the cropped, and scaled images are in the folder ~/oath/images/cropped_scaled (see download above). First, the images are rotated and placed in the folder ~/oath/images/cropped_scaled_rotated. Then, the TF_FeatureExtraction extracts the feature vectors and writes them into a HDF5 file called auroral_feat.h5 in the directory ~/oath/features/. On the laptop mentioned above (four-core Intel Core i7-3520-M) this takes about one hour.

Rotate the images

cd ~/oath/code
chmod a+x rotate.sh
./rotate.sh

Run feature extraction

cd ~/tensorflow/TF_FeatureExtraction
# this is one long command
python3 example_feat_extract.py --network inception_v4 --checkpoint ../checkpoints/inception_v4.ckpt 
	--image_path ~/oath/images/cropped_scaled_rotated/ --out_file ~/oath/features/auroral_feat.h5 
	--layer_names Logits

Train the ridge classifier
```
cd ~/oath/code
python3 ridge.py
```

This should produce the following output:

0.8174012593016601 0.010271527444147862
[[139  27  66   0   0  14]
 [ 37 222  58   1   0  16]
 [ 36  37 335   3   0   3]
 [  0   1   5 224   3   2] 
 [  0   1   3   2 183   1]
 [  7  13   7   1   1 299]]

Comments & questions

Comments and questions can be directed to Lasse Clausen

References

If you use the Oslo Auroral THEMIS dataset, please refer to:

Clausen, L. B. N., & Nickisch, H. (2018). Automatic classification of auroral images from the Oslo Auroral THEMIS (OATH) data set using machine learning. Journal of Geophysical Research: Space Physics, 123, https://doi.org/10.1029/2018JA025274

Acknowledgements & copyright

Unless stated otherwise, all data in the OATH Dataset is licensed under a Creative Commons 4.0 Attribution License (CC BY 4.0) and the accompanying source code is licensed under a BSD-2-Clause License.

In particular, all actual image data included in the tarball are modified from the THEMIS all-sky imagers. We thank H. Frey for giving us permission to include these data. Copyright for these data remains with NASA.

We acknowledge NASA contract NAS5-02099 and V. Angelopoulos for use of data from the THEMIS Mission. Specifically: S. Mende and E. Donovan for use of the ASI data, the CSA for logistical support in fielding and data retrieval from the GBO stations, and NSF for support of GIMNAST through grant AGS-1004736.