Azavea is a B Corporation whose mission is to advance the state-of-the-art in geospatial technology and apply it for civic, social, and environmental impact.
Their CEO, Robert Cheetham recently sat down with us to share how pandas, Jupyter Notebooks, Matplotlib, and NumPy are empowering their work.
Can you give us some background on your company?
Since 2001, we have been stretching the possibilities of geospatial technology to enable our clients to answer complex questions in a wide range of domains: urban ecosystems, water, infrastructure planning, economic development, public transit, elections, public safety, energy, and cultural resources management.
We have worked on projects that use machine learning to detect heavy machinery, buildings, algal blooms, oil spills, water, and trees in satellite and aerial imagery.
What are some of the NumFOCUS projects that have enabled the work you do?
We often use Jupyter notebooks, Matplotlib, and pandas to conduct data preprocessing and to help analyze experimental results.
Azavea also uses a library called Rasterio to read raster data stored as GeoTIFF files, which uses NumPy to store arrays. In addition, we use the Python library for working with geospatial imagery called Raster Vision. This is an open source framework for Python developers building computer vision models on satellite, aerial, and other large imagery sets with built-in support for chip classification, object detection, and semantic segmentation using PyTorch.
Raster Vision uses NumPy to store and manipulate images and labels. These are eventually converted to PyTorch tensors, but by using NumPy it allows us to decouple parts of the codebase from PyTorch, which is one of many deep learning libraries.
Matplotlib is also essential to visualize training “chips” which are small windows of the image along with training labels.
NIH Award
Another important use case I’d like to mention is a research project for which Azavea received an NIH award. In our research deep learning was used to make predictions about the severity of a disease from large whole slide images of tissue samples.
pandas was integral in creating data frames that tracked all the models we trained. We were then able to query and plot them in various ways.
A heatmap showing how much the model is attending to different parts of the image was created using Matplotlib which allowed us to compare different model types. It was also used to visualize the output of a figure/ground segmentation process for separating tissue samples from the slide background which served as a sanity check.
To learn more about the project you can view Azavea’s proposal here: https://www.sbir.gov/sbirsearch/detail/1914999
In conclusion, we’d like to thank Azavea for their willingness to share this information with our readers as well as their active involvement in the global open source community!