The use of Google Earth Engine (GEE) has been raising rapidly among researchers. The reason is simple: GEE allows processing massive amounts of remote sensing data directly in Google’s servers, enabling planetary-scale data analysis, free of cost.
Now, with the recent announcement from Google that a commercial version of GEE will be available for governments and business, many about what sort of trade-off does GEE represent.
Indeed, as appealing as it is in certain cases, Google Earth Engine is not a proper fit for many projects. Importantly, it’s a closed-source platform, which is only free for non-commercial, non-production use.
In a few words, Google Earth Engine should be considered as a rapid-prototyping tool for Geospatial applications. However, as it is a closed-source product, in order to later transition into production a developer must stick with GEE.
GEE is a closed-source platform
The software developed by Google to power its infrastructure is not open source.
This is perfectly fine, but the policy is in contrast with the software produced by the Pangeo Project, which is an large-scale open-source effort for Big Data geoscience.
Software based on open-source can be run on any cloud provider (such as Microsoft’s planetary computer or Amazon Web Services), or on on-prem facilites. On the contrary, software developed with GEE can only be run on Google infrastructure.
GEE imposes a restricted programming framework.
GEE provides, on one hand, a set of objects that are handled exclusively by the server (i.e. image collections), and on the other hand, client-side variables, which are only handled by the browser.
The parallel programming framework chosen by Google is based on Map and Reduce operations. Each of these operations is applied to each image in the collection independently and can be roughly interpreted as ‘filtering’ and ‘aggregating’. For example, to select a certain area or to specify a range of dates from an image collection we would apply a certain ‘map’ operation, and to summarize the selected data according to various statistics we need to apply a ‘reduce’ operation.
While this model enables massive parallelism in distributed commodity servers, it does so at the expense of introducing a complex coding style. As it has been noticed in this conference paper, the combination of server and client-side programming tends to be confusing. For example, simple index iteration is not recommended when working with server-side objects, because the index itself is a client-side variable. As reported in the documentation, to iterate an image collection, we must define a certain recursive function, which cannot modify values outside of the function’s scope, among other limitations.
With this in mind, it is quite clear that porting sophisticated applications into the GEE framework can be quite challenging.
GEE is only free for non-commercial use
To use Google Earth Engine, compliance with the license agreement is required. The license agreement states explicitly:
Earth Engine’s terms allow for use in development, research, and education environments. It may also be used for evaluation in a commercial or operational environment, but sustained production use is not allowed. Additionally, data products generated by Earth Engine may not be sold.
This is in contrast with the underlying data, which is often in the public domain. For example, NASA is a US federal agency, and as such, cannot claim copyright of the material made available to the public. (see also NASA’s policy page)
Google also claims to be offering commercial licensing options for their platform, but the usage conditions and pricing are nontransparent (i.e. write to us if you are interested).
In sum, Google Earth Engine is targeted at non-profit institutions. And even though a good number of research is carried out in a non-for-profit spirit, many researchers could be interested in potential spin-offs of their research, especially as geospatial data analysis is thriving business right now.
It well might be the case that eventually Google will start promoting a paid service around Google Earth Engine for commercial applications, in which case, this disadvantage might turn just into a matter of price.
The free version is not suited for ‘production’ workloads
GEE’s FAQ also states that sustained production use is not allowed.
So even if we are working in a non-profit setting, we can’t rely on GEE for goals such as continued, real-time environmental monitoring.
Sustained production use is not only in violation of GEE’s terms and services, it is also impossible to do, as workloads with estimated times to completion longer than 5 minutes will be executed in batch mode, at indeterminate times. Presumably, if a user is in violation of the terms and services, GEE will stop executing those batch jobs.
The free version has some processing and storage limits
While GEE is generally free, there are a few caveats concerning processing time and storage limits.
As for processing, there are two available modes within GEE: interactive and batch-mode. The interactive mode is extremely fast but limited to jobs that can be served in under 5 minutes of processing time. The batch mode, on the other hand, is considerably slower and requires the export of data to various Google’s services, such as Cloud Compute or Google Drive, where users can incur some costs, albeit probably small ones.
GEE is inconvenient for smaller datasets
While not exactly a limitation of GEE, something else to take into consideration is that GEE might not be necessary at all in many cases, as many important remote sensing datasets can be handled perfectly by desktop computers. And a system that is designed for handling massive parallelism and Petabyte-scale datasets isn’t the optimal choice for dealing with smaller datasets.
A good number of remote sensing applications require fast access to data that can perfectly fit in just a few Terabytes, or even a few hundred GB. SSD drives currently sell for under $40 per TB and even an inexpensive laptop is extremely powerful nowadays.
Additionally, open-source software libraries, which is in some cases are provided by Space Agencies themselves, and importantly, under permissive copyleft licensing conditions, are making great progress in easing the very first few steps of the Remote Sensing pipeline (downloading, opening, and creating time-series of data).
The power and freedom associated with owning a modern workstation should not be underestimated, and the cloud providers’ marketing of their infrastructure as a one-size-fits-all solution can be disregarded in some cases.
As appealing as it is in certain cases, Google Earth Engine is not a proper fit for many projects. Importantly, it’s free only for non-commercial, non-production use.
In sum, GEE should be considered as a rapid-prototyping tool for Geospatial applications. However, developers should note that, after sucessfull prototyping, production deployment will only be possible with Google, as the combination of a closed-source and restrictive programming framework lead to a fabuolous vendor lock-in effect.
What GEE solves in many use cases (those which are not global-scale high-resolution applications) is more of a Software Engineering problem, than an infrastructure problem.
Fortunately, doing your own software engineering for remote sensing applications it’s not that difficult anymore. In fact, there will be more posts in this blog about how to set up a Python workflow to be able to get things done Remote Sensing, in a truly open-source fashion (i.e. without relying on any closed-source proprietary platform).
In cases where the possibility of getting involved in commercial applications or sustained production use of remote sensing, taking the time to acquire the skills to work independently from Google Earth Engine can be worthwhile, especially as the cost of powerful desktop workstations and hard drives are going down, while the quality of open-source remote sensing software licensed under permissive conditions is going up.
Finally, don’t forget to check out this site for updates, as forthcoming posts will cover the required skills to work directly with space agencies’ data, relying on open-source software libraries.