Zero-Shot AI for Remote Sensing: A New Pipeline for Automated Image Segmentation

Published 26 February, 2025

The amount of aerial and satellite imagery captured worldwide continues to grow. Yet, efficiently identifying and labeling features in these images—like roofs, cars, or trees—remain challenging. To that end, researchers at Politecnico di Milano and the National Technical University of Athens developed a new pipeline by combining advanced AI models with smart data-handling strategies.

The study was published in the KeAi journal Artificial Intelligence in Geosciences.

“General-purpose AI models are powerful, but they often struggle when asked to locate unfamiliar objects without explicit training,” says corresponding author Professor Maria Antonia Brovelli from Politecnico di Milano. “By using a sliding window hyper inference approach to cut large images into smaller, more manageable patches, and by applying an outlier-rejection step to remove erroneous detections, we greatly reduce computational burden of the models and improve the accuracy in identifying specific features.”

The new pipeline leverages open-source foundation models like Segment Anything Model (SAM) and Grounding DINO in a strategic two-step process. First, it intentionally over-detects objects to ensure even the smallest details are captured. This is achieved through a sliding window approach, which applies the detection model to smaller image patches. This method not only reduces the computational burden critical for large-scale remote sensing imagery, but also enhances detection accuracy.

Next, the system refines the results by filtering out irrelevant bounding boxes, such as those that are excessively large or poorly positioned, using statistical and data-driven techniques. The remaining high-quality bounding boxes are then passed to SAM, which generates precise segmentation masks.

The pipeline operates in a zero-shot manner, meaning the models were used in an off-the-shelf fashion, retaining their original training parameters without any additional fine-tuning or retraining on external data. In aerial images with a spatial resolution of less than one meter, the developed pipeline achieved outstanding segmentation results, reaching up to 99% accuracy.

“Essentially, we’re taking advantage of the versatility of off-the-shelf, large-scale AI models, by building a robust processing pipeline developed by Mohanad Diab to achieve the best results,” Kolokoussis adds. “We hope this pipeline will make automated remote sensing imagery analysis more accessible, speeding up everything from environmental surveys to urban planning.”

“Overview of the proposed segmentation pipeline results using LangRS. Image created by M. Diab, P. Kolokoussis, and M.A. Brovelli, Politecnico di Milano and NTUA.”

Contact author: Mohanad Diab, Politecnico di Milano, mohanadyousef.diab@mail.polimi.it

Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The author is an Editorial Board Member/Editor-in-Chief/Associate Editor/Guest Editor for [ISPRS -

International Journal of Geoinformation] and [Taylor \& Francis - International Journal of Digital Earth]

and was not involved in the editorial review or the decision to publish this article.

See the article: Mohanad Diab, Polychronis Kolokoussis, Maria Antonia Brovelli,

Optimizing zero-shot text-based segmentation of remote sensing imagery using SAM and Grounding DINO,

Artificial Intelligence in Geosciences, 2025, 100105, ISSN 2666-5441, https://doi.org/10.1016/j.aiig.2025.100105.

Back to News

Stay Informed

Register your interest and receive email alerts tailored to your needs. Sign up below.