View on GitHub

gis-lessons-learned

blog post recalling past GIS woes and new resources discovered

GIS modeling in 2022, where to begin

January 28th, 2022
Kendall Frimodig



A brief history of the past 6 years


Around 6 years ago I began my journey in data analytics, as I enrolled in a two year masters program in Epidemiology. In retrospect this was rather poor timing; as I began my study in 2016 the field at large began to take off in a revolutionary way, and I was quite oblivious to these developments as I found my field of practice either insulated or resistant to such an overhaul.

Today I will focus primarily on providing cliff notes for someone undertaking their first geostatistical project today - but I must first defend my caveman like methods employed just 4 years ago, when I completed my first geospatial predictive model Here are a few illustrations of the developments taking place from 2016-2018.












With the pace we're seeing here, it will likely take some time for such changes to be reflected in textbooks. If you're working on a more advanced spatial regression problem, you might still want to crack open the textbook as there's many methods and associated parameters worth tuning. If you're looking to calculate a vegetative index, or classify land use characteristics, I have great news.

GIS in the cloud


In short, sophisticated data analysis including geospatial variants are simply more feasible in 2022. The bulk of my time spent on my first project involved the acquisition of orthoimagery, stitching of raster datasets, sampling and projecting to grids, and running local cluster analysis on such data. CPU time was certainly a limiting factor, with some functions running for up to 24 hours in ArcMap. As an example, if you wanted the best public imagery for a analysis of texas, this would amount to roughly 4 TB of data.


All of that pre-processing, and even modeling can now be done in the cloud! Sentinel-2 is currently the highest resolution, public satellite imagery source available (if time of image not as important, NAIP imagery is best as its taken from an airplane and 1m or lower resolution). Google earth engine is a repository for public and proprietary imagery, and an interactive application for suitable for most analysis.


Both have API's which can be called in a local python instance, and with a few days of study both can be queried to obtain model-ready raster data, without ever having to download raw imagery. For earth engine, you can actually apply simple calculations such as a NDVI or land type classification, with the javascript console or from python if you intend the results to be stored to be stored to google cloud storage.


Start your new project here, with the technical walk-throughs provided, and you will have far more time to invest in the creative and intellectually stimulating side of this work: solving the problem, visualizing the data, posing future questions for investigation.


Final Thoughts


The modern GIS workflow is highlighted by outsourcing storage and computational resources, applying neural networks for object classification, and leveraging imagery with higher spatial and temporal (detail, frequency of snapshots) resolution to tackle problems not imagined previously. Particularly, the reduced cost and software now available for DIY style drone based data collection has shown great promise - resolution of < 10 cm often.


With the marked increase in data-science libraries, enhancement of data collecting instruments, and ability to leverage cloud computing resources, one can breeze past the grueling pre-processing steps which were in the very recent past a standard.



Here are a few helpful tutorials for understanding earth observation (EO) data, sentinel hub, and google earth engine.


Introductions


Earth engine


Sentinel Hub