Working with high-resolution satellite imagery, before the recent development Cloud-Optimized GeoTIFFs (COG), required downloading massive datasets which, most of the time, included a lot of useless data. More often than not, a user wants to access only a few bands in a small area at a time, instead of a full multispectral scene.

With Cloud-Optimized GeoTIFFs, we can now query a web server just for the information we require. This has made working with high-resolution imagery substantially easier (and cheaper, as well!).

## Cost of satellite imagery on AWS

Accessing public datasets on AWS like Landsat and Sentinel are stored in requester-pays buckets. This requires payment for data transfer and processing above the free-tier quotas, which should suffice for prototyping and other use cases, making access to this data virtually free for most use cases.

In detail, AWS always-free tier was recently expanded to 100GB per month of network egress.

What’s important about this is that code can be prototyped for free, downloading data and processing data locally, and deployed into production using AWS services like AWS Lambda, which also have a convenient free-tier.

Other reasons why AWS for Geospatial can be a good suit for many use cases is that:

• It leaves freedom regarding the choice of programming languages and libraries you can use.
• It allows to rapidly transition from prototyping into production.
• Is based on open-source tools which means there is less vendor lock-in.

At least at the time of writing this, none of these seems to be possible with GEE. As I’ve discussed in this post, GEE imposes many restrictions in return for a totally free service, which is comprehensible.

Ok, now that we cleared the convenience of relying on COG on AWS, let’s go about how to actually do this.

Now, we should beware that cloud-native geospatial development is still a rapidly changing field. Not surprisingly, I’ve noticed a certain lack of updated documentation in this area, as I had to go through lots of outdated tutorials to be able to begin working with this new technology.

### Step 1: Create and configure an AWS account and the AWS CLI

The first thing you need is an AWS account. Remember that Amazon wants everybody’s credit card information, including users of the free-tier services.

This step has been covered over and over on other places, like here on the AWS site.

You will also need to set up the AWS Command Line Interface. To do so, you can follow this instructions.

### Step 2: Find a STAC server and explore collections

Now we need to find which Landsat or Sentinel scenes we are interested in. To do this, another interesting development took place recently, which was the creation of the STAC standard for Spatio-temporal asset catalogs.

As possitive as this recent development might be, regrettably the standardization process seems to be taking some time, with some servers implementing only a portion of the standard, and the existance of certain client/server incompatibility.

The first important thing is to obtain the URLs of actual, functioning, STAC servers. So far, I’ve found the following:

• For Sentinel-2, the team of Element84 maintains the earth-search catalog on AWS https://earth-search.aws.element84.com/v0.
• For Landsat, we can use the official USGS STAC server https://landsatlook.usgs.gov/stac-server, which includes s3 file locations in their metadata.
• For many other collections, the Microsoft Planetary Computer has a STAC server at https://planetarycomputer-staging.microsoft.com/api/stac/v1

To query a STAC server, we are going to have to use two Python packages:

To install these packages,

pip install pystac-client sat-search


The reason to use two clients is that, at the time of writing this, the earth-search API endpoint we need for the Sentinel data doesn’t seem to be running a server which is compatible with the latest STAC standard, which the more actively developed pystac_client requires. On the other hand, the package sat-search continues to work with the older server version.

Using two different API clients is far from being an optimal solution, in particular as sat-search seems to not being actively developed anymore. However, it seems like the reasonable thing to do, as we wait for an update of the earth-search server.

Let’s start working with pystac-client. The first step is to connect to a STAC server and query the available collections, for which we do:

from pystac_client import Client

LandsatSTAC = Client.open("https://landsatlook.usgs.gov/stac-server", headers=[])

for collection in LandsatSTAC.get_collections():
print(collection)


Choose the collection of your interest. In my case, I selected landsat-c2l2-sr, as that data is atmospherically-corrected.

As for satsearch, we can check that the connection to the server is working with the following simple command:

import satsearch

SentinelSTAC = satsearch.Search.search( url = "https://earth-search.aws.element84.com/v0" )
print("Found " + SentinelSTAC.found() + "items")


### Step 3: Look for scenes which intersect an area of interest

Now that we know which collection we are interested in, the next thing is to look for scenes which intersect a given area and a time range.

Importantly, at least at the time of writing this, not all STAC servers support a search functionality, and are static-only servers. That means we should query their entire catalog, and search for the scenes of interest locally. But fortunately, the STAC server at USGS does support this dynamic feature.

We first define our area of interest with a geojson file (which can be produced interactively at geojson.io):

from json import load

file_path = "my-area.geojson"
geometry = file_content["features"][0]["geometry"]

timeRange = '2019-06-01/2021-06-01'


Next, we prepare the parameters, and we query the STAC server:

LandsatSearch = LandsatSTAC.search (
intersects = geometry,
datetime = timeRange,
query =  ['eo:cloud_cover95'],
collections = ["landsat-c2l2-sr"] )

Landsat_items = [i.to_dict() for i in LandsatSearch.get_items()]
print(f"{len(Landsat_items)} Landsat scenes fetched")


Finally, we can now process the metadata to extract the location of the images.

This code will print the URLs of the red band of the Landsat collection we just found:

for item in Landsat_items:
red_href = item['assets']['red']['href']
red_s3 = item['assets']['red']['alternate']['s3']['href']
print(red_href)
print(red_s3)


Notice that, for Landsat data from the USGS STAC server, the default URL is in the USGS site, so you could actually get the data directly from them without the need of an AWS account at all.

However, if you later want to do cloud-based data processing on AWS (like we will do in subsequent posts!), it will be convenient to rely on AWS data. So, for the remainder of this post, we’ll just stick with the s3 location.

For sat-search, the syntax is slightly different.


SentinelSearch = satsearch.Search.search(
url = "https://earth-search.aws.element84.com/v0",
intersects = geometry,
datetime = timeRange,
collections = ['sentinel-s2-l2a'] )

Sentinel_items = SentinelSearch.items()
print(Sentinel_items.summary())

for item in Sentinel_items:
red_s3 = Sentinel_items[0].assets['B04']['href']
print(red_s3)


### Step 3: Open the desired GeoTiff with RasterIO

The next step is to open the area of interest of the Cloud-optimized GeoTIFF. We will resort to RasterIO, which will need to access an s3 resource on AWS. It is necessary to install the optional s3 component of RasterIO (see docs), and, on some systems configure SSL certificates.

pip install rasterio[s3]

import os
import boto3

os.environ['CURL_CA_BUNDLE'] = '/etc/ssl/certs/ca-certificates.crt'

print("Creating AWS Session")
aws_session = rio.session.AWSSession(boto3.Session(), requester_pays=True)


An additional thing to solve is the coordinates-to-pixels transformation for Landsat and Sentinel imagery. To do this, we can resort to the pyproj package which handles geographical coordinates and transformations.

We summarize what we need in the following function:

import rasterio as rio
from pyproj import Transformer

def getSubset(geotiff_file, bbox):
with rio.Env(aws_session):
with rio.open(geotiff_file) as geo_fp:
# Calculate pixels with PyProj
Transf = Transformer.from_crs("epsg:4326", geo_fp.crs)
lat_north, lon_west = Transf.transform(bbox[3], bbox[0])
lat_south, lon_east = Transf.transform(bbox[1], bbox[2])
x_top, y_top = geo_fp.index( lat_north, lon_west )
x_bottom, y_bottom = geo_fp.index( lat_south, lon_east )
# Define window in RasterIO
window = rio.windows.Window.from_slices( ( x_top, x_bottom ), ( y_top, y_bottom ) )
# Actual HTTP range request
subset = geo_fp.read(1, window=window)
return subset


### Step 4: Process Your Data Locally

As a final step, we will just go through the image catalog we found, download our area of interest, and perform some local processing, which in this example will be just computing the NDVI and saving an image.

def plotNDVI(nir,red,filename):
ndvi = (nir-red)/(nir+red)
ndvi[ndvi>1] = 1
plt.imshow(ndvi)
plt.savefig(filename)
plt.close()


Of course, in a real application we would be using much more advanced techniques, like image segmentation or machine learning. What I want to stress in the present guide, is that we can prototype those algorithms locally and later deploy them to the cloud (a topic we will cover in a subsequent post).

Now the only thing left is to loop over the catalog and process all the files, which we do as follows:

from rasterio.features import bounds
import matplotlib.pyplot as plt

bbox = bounds(geometry)

for i,item in enumerate(Sentinel_items):
red_s3 = item.assets['B04']['href']
nir_s3 = item.assets['B08']['href']
date = item.properties['datetime'][0:10]
print("Sentinel item number " + str(i) + "/" + str(len(Sentinel_items)) + " " + date)
red = getSubset(red_s3, bbox)
nir = getSubset(nir_s3, bbox)
plotNDVI(nir,red,"sentinel/" + date + "_ndvi.png")

for i,item in enumerate(Landsat_items):
red_s3 = item['assets']['red']['alternate']['s3']['href']
nir_s3 = item['assets']['nir08']['alternate']['s3']['href']
date = item['properties']['datetime'][0:10]
print("Landsat item number " + str(i) + "/" + str(len(Landsat_items)) + " " + date)
red = getSubset(red_s3, bbox)
nir = getSubset(nir_s3, bbox)
plotNDVI(nir,red,"landsat/" + date + "_ndvi.png")



## Conclusion

You can get the complete code for this tutorial here.

Hopefully this tutorial has cleared out some basic questions and helped you get started with Cloud Optimized GeoTIFFs on AWS S3, using Python.

I plan to write more tutorials on Cloud-Native Geospatial Development, covering applications and ways to deploy our code on the cloud.

One way to get a notification when this new content comes out is to subscribe to my mailing list (I only send one e-mail per month, at most).