InferenceVision

Methodology

In this section, we outline the methodology employed for deriving geographic coordinates from input data within the InferenceVision framework. This methodological approach combines advanced techniques in satellite image analysis, object detection, and geographic coordinate calculation to enable precise geospatial analysis and visualization. The methodology consists of three key stages:

1. Transform VHR Satellite Image Coordinates to WGS 84 (EPSG:4326)

The target Coordinate Reference System (CRS) is WGS 84, representing a geographic coordinate system. Converting to this CRS standardizes the data. We use Nearest Neighbor interpolation, which can result in a blocky appearance. Transformed coordinates are precise to 9 decimal places (as default). The transformation is expressed as:

G_EPSG:4326 = transform(G_dataset, CRS_dataset)

Then, we extract polygon coordinates, defining the geographical extent with top-left (TL) and bottom-right (BR) corners as reference points for computing the geographic coordinates of normalized centers.

2. Calculate Normalized Centers

The center coordinates of detected objects are derived from the bounding boxes surrounding them. These bounding boxes are defined by the edge coordinates, specifically the minimum and maximum values for both the x and y axes: x_min, y_min (the bottom-left corner) and x_max, y_max (the top-right corner). The center of each object is then calculated as the midpoint between these edge coordinates, providing the precise geographic location for each detected object.

(x_center, y_center) = (x_min + x_max) / 2, (y_min + y_max) / 2

The centroids of bounding boxes are then normalized to convert pixel coordinates into a standard format:

N_x = x_center / W

N_y = y_center / H

Where:

N_x and N_y: Normalized pixel coordinates.
W and H: Total width and height of the raster image.

3. Calculate Geographic Coordinates

In this final step, geographic coordinates are determined by mapping the normalized center coordinates and the corner coordinates of the extracted polygon. This process involves translating the geometric data from image space to real-world geographic coordinates, ensuring accurate georeferencing of the detected objects.

lat = lat_TL + (lat_BR - lat_TL) × N_x
lon = lon_TL + (lon_BR - lon_TL) × N_y

Where:

lat and lon: Latitude and longitude of the detected object.
N_x and N_y: Normalized center coordinates.
lat_TL, lon_TL, lat_BR, lon_BR: Corner coordinates of the polygon.

The models used for object detection in this framework are built using the Ultralytics library, which provides state-of-the-art implementations for training and deploying deep learning models, such as YOLO (You Only Look Once). By leveraging the Ultralytics library, we ensure that the models are optimized for high accuracy and performance, making them well-suited for processing high-resolution satellite images and detecting objects within complex geospatial data. The input image must have a CRS set to ensure accurate geographic coordinate calculation.

For detailed instructions on how to use this system within a Jupyter Notebook environment, please refer to the usage guide.