The commercially captured satellite imagery that is available in SpaceNet dataset provides wide array of opportunities for advancements in computer vision and machine learning. This article attempts to create understanding and awareness for SpaceNet dataset for those who are unfamiliar with it.
What is SpaceNet ?
SpaceNet is a publicly available dataset containing commercial satellite imagery. This imagery is accompanied by labelled information that can be used to train machine learning models. CosmiQ Works, Radiant Solutions and NVIDIA have partnered to release the SpaceNet data set to the public in order to foster innovation in the field of computer vision with algorithms that can automatically extract geometric features such as roads, buildings etc.
In order to better understand the dataset, it is helpful to know from where the data is sourced. The satellite imagery is provided by Digital Globe, an American vendor of space imagery and geospatial content. Digital Globe owns a “constellation” of satellites that capture high quality remote sensing imagery. The satellites include QuickBird, GeoEye-1 and the WorldView Series (WorldView-1, WorldView-2, WorldView-3 and WorldView-4). The data generated from each satellite varies in terms of features such as resolution, geolocational accuracy etc.
What locations are covered in SpaceNet?
SpaceNet contains satellite data for 5 Area of Interests (AOIs). Below table summarizes the data in 5 AOIs.
In the round 1 of SpaceNet Challenge(Nov, 2016) open corpus of imagery from WorldView-2 satellite for Rio De Janeiro (AOI_1 location) was used. This imagery was 50 cm mosaic of Ground Sample Distance (GSD) with eight spectral bands. In round 1, 42 developers competed in an open challenge hosted by TopCoder to create algorithms that extract building footprints from satellite imagery.
The next phases of the SpaceNet Challenge are a follow-on competition using DigitalGlobe’s 30 cm imagery from WorldView-3 and building footprints across four new geographically diverse cities spread around the globe. These datasets now have multiple imagery formats (panchromatic, multi-spectral, RGB-pansharpen, multispectral-pansharpen) to allow experimentation with different types of imagery. Also, the data is available for download in training-test split.
How much area is covered?
Below visualization show the geographical area covered in the SpaceNet dataset for all AOIs.
Understanding the Dataset
For better understanding, you can visualize the SpaceNet dataset to be divided in 2 categories: vector data and raster data.
In vector representation, we describe the data in terms of geometric shapes such as points, lines and polygons. This representation is usually written in a mark-up language syntax (such as Well-Known Text) and explicitly stores the actual coordinates of vertex.
In Raster representation(including imagery), we divide the area in equal squares and assign characteristics to these squares. For example, we may use a 2-dimensional matrix to represent an area. Other than an origin point, e.g. bottom left corner or top-left corner, no geographic coordinates are stored.
Now, let’s take a look at both kind of data in the context of SpaceNet.
Raster data in SpaceNet
Raster data in SpaceNet dataset is present in the form of .tif images. These GeoTiff images are a special format of image data which incorporate meta-data that can be used for georeferencing an image and how a pixel in the image is mapped to real world distances.
The Rio de Janeiro dataset (AOI_1) used in Round-1 of SpaceNet challenge, is formated slightly differently and is missing pansharpened, 8-band and multispectral data. This dataset only contains RGB channels pansharpened to a resolution of 0.5m. In all the other datasets, you would observe following directory structure for both train and test data.
Most commercial satellites capture imagery over multiple coarse resolution multispecral bands as well as a finer spatial resolution panchromatic band. This is done because of the tradeoff between the spectral resolution (i.e. the range of wavelengths that are sampled by an imaging detector) and the spatial resolution (read more). Spacenet dataset provides us images with panchromatic, RGB and 8-band channels. Let’s discuss the characteristic properties of these images.
The 8-band, multispectral images include the following bands: Coastal Blue, Blue, Green, Yellow, Red, Red Edge, Near Infrared 1 (NIR1), and Near Infrared 2 (NIR2). These images have a resolution of ~1.3 m. However, 8-band and RGB images can have their resolutions increased to their respective satellite’s panchromatic resolution through pansharpening. This pansharpening to ~0.3m resolution is already done for MUL and RGB images in their respective -PanSharpen directories.
Panchromatic images are single band images for which the sensor is sensitive to all wavelengths of visible light, hence providing most realistic reproduction of a scene.This imagery is extremely useful, as it is generally of a much higher (spatial) resolution than the multispectral imagery from the same satellite. Panchromatic images are generally displayed as shades of gray and are also used to perform pansharpening for multispectral bands.
But what is the advantage of having different types of images?
Images captured by WorldView-3 satellite vary in terms of wavelengths as shown below:
As you can see in the above image, the RGB bands in 8-band images do not include Yellow band. Separating Yellow and also including Coastal Blue bands allow for novel feature extraction such as vegetative analysis by providing “yellower” and “bluer” information of objects. The Red-Edge (705–745nm) provides wavelengths of light just beyond the Red wavelengths. The significance of this channel is that Chlorophyll is transparent to wavelengths > 700nm, leading to nicer application for vegetative analysis. Near Infrared (NIR) bands 1 & 2 can be used to perform NIR spectroscopy, whose applications include investigating plant health, soil conditions and atmospheric analysis.
Vector data in SpaceNet
Vector data, such as building footprints, are provided in SpaceNet dataset in both CSV and GeoJSON formats. The CSV provides the coordinates of the vertices of the building footprint (polygons) as latitude and longitude in the same map projection as the images, and the GeoJSON provides this same information in pixel coordinates relative to the image.
CSV files are present in “summaryData” directory and GeoJSON files are present in “geojson/buildings” for buildings’ labels or inside “geojson/spacenetroads” for roads’ labels. Note that the vector data is, understandably so, available only for training datasets.
If you are new to GeoJSON standard or want a refresher on it, I recommend reading this article that I wrote for explaining GeoJSON standard and tools that you might use to visualize GeOJSON data.
A primer on GeoJSON standard and visualization tools
Get better insights on GeoJSON standard and visualization.
Download the Datasets
SpaceNet dataset is available on AWS in “Requester Pays” S3 buckets, meaning that the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. You can use aws-cli to download the datasets by following these instructions.
Visualize the Data
Visualizing data is a good way to familiarize yourself with the nature of quality of your dataset. As an example, I am going to visualize the sample data available in the SpaceNet dataset. Also, I will use QGIS software to visualize vector and raster data of SpaceNet. QGIS is a free, open-source cross-platform software for viewing, editing, and analyzing geospatial data.
Similar to what ArcCatalog is for ArcGIS, QGIS Browser is the standalone visualization application in QGIS that comes bundled with default QGIS installation. You can use QGIS Browser to have a quick look at your geospatial data.
- Open “QGIS Browser” on your computer and use the left pane to browse to the directory that contains SpaceNet data. The “Preview” tab in right pane will show you a quick preview of the data points, after you click on .geojson, .tif or .csv file.
- Switch to the “Attributes” tab to see the associated attributes values for the vector data.
QGIS Desktop is the “batteries included” standalone application in QGIS package. We will use it to map over vector data over raster imagery.
- First open the .tif file in QGIS Desktop, using menu option Layer -> Add Layer -> Add Raster Layer.
- You should now to able to see the raster preview and the raster image should be added in the “Layers” pane, as shown below.
- Now open the vector (geojson) file using Layer -> Add Layer -> Add Vector Layer->Browse menu options.
- You should be able to see vector data layer overlaid on raster layer. If not, make sure your vector layer is above the raster layer in the “Layers” pane. You can change this by simply dragging and dropping a layer above or below another. You can also customize the line width and color for the drawn vector by double-clicking on the corresponding layer and choosing the respective option.
Note: The SpaceNet data has both vectors and rasters numbered according to the index of grid they represent. To see overlay, use same indexed vector and raster data. For example, we used RGB-PanSharpen_AOI_2_Vegas_img517.tif as raster and spacenetroads_AOI_2_Vegas_img517.geojson as vector data.
You can also open multiple raster and vector files to visualize more grids together.
Hopefully this article was helpful for you to start working with SpaceNet data. In the next article, I will explain how to use Python to work with SpaceNet data. Thank you for your time and I will see you in the next one! :)