Substorm Onset Prediction using Machine Learning Classified Auroral Images

In our publication we classify all sky images and use the classified images to predict whether a substorm will occur within 15 minutes after a 30-minute interval of processed image data has been shown to a classifier. Our data is archived on the NIRD Research Data Archive. It is licensed under CC-BY 4.0 and available for at least 10 years after publication. The archive with accompanying information can be accessed here:
On this website we will provide and describe the code we used in our publication for replication of our results.

1. Setup

  1. Make sure all dependencies are installed:
   sudo apt-get install python3 wget p7zip
  1. Download our code:
   mkdir ~/SubstormOnsetPrediction
   cd ~/SubstormOnsetPrediction
   7z x code.7z
  1. Create a conda environment from the environment file we provide. This may take a few moments. Afterwards, activate the environment, to use it. You can find installation instructions for conda here:
   conda env create -f environment.yml
   conda activate OnsetPrediction
  1. Download our fully preprocessed data archives and extract them:
   wget -P
   7z x data.7z -o./data

After downloading code and data, your folder structure should look like this:

   ├── data
   │   ├── images
   │   ├── magn
   │   ├── other
   │   └── themis
   ├── dataHandler
   │   ├── DataClasses
   │   │   ├──
   │   │   ├──
   │   │   ├──
   │   │   └──
   │   ├── helpers
   │   │   ├──
   │   │   ├──
   │   │   ├──
   │   │   ├──
   │   │   └──
   │   ├──
   │   └── processors
   │       ├──
   │       ├──
   │       ├──
   │       ├──
   │       └──
   ├── environment.yml
   ├── index.html
   ├── LICENSE

2. Execution:

  1. Run Python
  1. Import our library and start processing
   from dataHandler.processors.SubstormDetection import SubstormDetection
   sd = SubstormDetection()

This will create the figures we show in our publication and save them in ./data/images.

3. Retrieval of original data

All data used for this project is available under open licenses. We provide information on how to obtain the original data here, and make our processed data available alongside. Information on how to obtain the processed data can be found in 1.4.
The all sky imager data is obtained from THEMIS. A description of the data can be found here and the original data here
The list of substorms is obtained through SuperMAG and can be downloaded here


If you have any questions or remarks, please send me an e-mail.


The data is archived here:
If you have not already done so, please read our publication based on this data:

If you use any part of this library or our publication, you can cite us the following way:

   author = {Sado, Pascal and Clausen, Lasse Boy Novock and Miloch, Wojciech Jacek and Nickisch, Hannes},
   title = {Substorm Onset Prediction using Machine Learning Classified Auroral Images},
   journal = {Earth and Space Science Open Archive},
   pages = {15},
   year = {2022},
   DOI = {10.1002/essoar.10512391.2},
   url = {}

Acknowledgements and Copyright

The source code in this library is licensed under a BSD-2-Clause License. Unless stated otherwise, all data contained in the datasets we provide ourselves alongside this publication under the links above are licensed under a Creative Commons Attribution-NonCommercial 4.0 License (CC BY-NC 4.0). The copyright for the all-sky imager data, some data-files are derived from, remains with the original copyright holder, NASA / the THEMIS project. The copyright of the substorm data remains with their respective authors: