SATellite ImageNet

SATIN: A Multi-Task Remote Sensing Metadataset

The Satellite ImageNet (SATIN) metadataset is a comprehensive collection of resources designed to train, evaluate, and analyze vision-language (VL) models for classifying satellite and aerial imagery. SATIN consists of:

  1. A metadataset curated from 27 existing remote sensing datasets, spanning a wide range of tasks, resolutions, fields of view, and geographic areas. These datasets are organized into six distinct tasks, featuring over 250 distinct class labels, and imagery resolutions spanning five orders of magnitude.
  2. A zero-shot transfer classification approach, which enables the evaluation of VL models across various tasks and datasets without the need for fixed category labels. This makes it possible to test a single VL model across different tasks, addressing the challenges of image diversity, label hierarchies, and scene complexity in remote sensing classification.
  3. A streamlined benchmark, leveraging platforms like Hugging Face Datasets, to provide a seamless experience for hosting, downloading, and evaluating models and datasets with minimal friction.
  4. A public leaderboard for tracking the performance of VL models on the SATIN benchmark, promoting research and development in the remote sensing domain.

The format of the SATIN metadataset is designed to be model-agnostic, allowing any VL model capable of processing satellite and aerial imagery to participate. The ultimate goal of SATIN is to drive research and progress in the development of robust and accurate models for interpreting remote sensing imagery, with potential applications in land-use planning, natural resource management, food security, and environmental risk management.

SATIN was presented at the TNGCV Workshop, ICCV 2023.

Any questions?

If you have any questions or feedback about our work, please contact us.


If you have used our benchmark or have found our work useful in your research, please consider citing our paper:

  title        = {SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models},
  author       = {Jonathan Roberts, Kai Han and Samuel Albanie},
  year         = {2023},
  journal      = {arXiv preprint arXiv:2304.11619}