Using Analysis-Ready Data to Train More Accurate and Consistent…

Maxar’s Analysis-Ready Data (ARD) handles all the complex steps required to prepare satellite imagery for analysis so that customers receive imagery that can immediately be used in machine learning (ML) model development. Utilizing ARD imagery saves data scientists like me valuable time because it reduces the steps that are needed to prepare satellite imagery for ML model development.

This is achieved thanks to the consistency of Maxar’s unique pre-processing steps, including coregistration within the stack of imagery, and outputs. Coregistation increases the quality of a model’s inferencing, removing false positives seen in misaligned data. ARD’s consistent pixel resolution throughout a stack of images saves developers time by helping determine how large a feature in an image is from its pixel count only. Another advantage of ARD for ML is the vibrant color balancing, which better highlights features of interest in the imagery that may otherwise be hard to distinguish.

*The WorldView-3 image on the left shows Algiers, Algeria, without ARD processing. The image on the right is the same image ordered through the ARD Order API.*

To quantify the benefit of ARD ML, I trained two ML models side by side—one on image strips and one on ARD imagery—to detect passenger vehicles. Vehicles can be difficult to detect by both humans and machines in certain image conditions, including low sun elevation angles, high off-nadir angles and extensive cloud cover. The study below explains how an object detection model trained on Maxar ARD imagery more accurately and consistently located vehicles in challenging image conditions, achieving an average precision (AP) score that was 12% better than a model trained on image strips.

Preparing and running the model

For this study, I selected 60 WorldView-3 images of dozens of cities around the globe. These images represent a mix of standard image conditions (low off-nadir angle, low cloud cover) and challenging image conditions (off-nadir of 25-40 degrees, 25% or more cloud cover). I ordered these images as Maxar ARD and as image strips.

Next, I used the imagery to train two car detection models—one on the 60 ARD images and one on the 60 image strips—using instance segmentation, an ML task that is used to locate specific features within an image by identifying object boundaries.

Traditional object detection models produce bounding boxes around objects. Bounding boxes are simpler to represent than segmentations because they consist of only four points; however, they do not represent object size and orientation as well. Therefore, I fit rotated bounding boxes to the object boundaries and used these instead.

*Traditional object detections are shown on the left, and the rotated bounding boxes created from instance segmentations are on the right.*

Whether you are doing instance segmentation or another ML task, the ML process can be split into three phases:

Training: the process of teaching the model how to recognize and detect the objects in question
Validation: the process of gauging how well the model is learning during training
Testing: the process of evaluating how good the final model is at detecting objects it has not seen before

Training, validation and testing each require their own unique set of images and object labels. These object labels are called ground truth, and the images and labels used in each phase are called the training, validation and test sets.

I leveraged an existing dataset of rotated bounding boxes drawn on cars in the ARD imagery to create my training, validation and test sets for the ARD car detection model. I matched the bounding boxes from the ARD datasets to the image strips and used the same bounding boxes from the ARD datasets to create a second set of training, validation and test sets. I then trained both the ARD model and the image strip model for a total of 6,000 iterations.

Selecting the best model

To evaluate my experiment, I selected the iteration of each model that performed best on the validation set. The “best” model performance is determined by the average precision (AP) metric. AP is a single score that measures a feature detector’s ability to locate the ground truth in a dataset image while avoiding false positives, and we described it in more detail in this blog post.

For this study, I used AP50, an AP score that defines correct detections as those that have an intersection-over-union (IOU) of 50% or more. IOU is a measure of the similarity between a model detection and the corresponding ground truth label. Object detection models are evaluated by assessing detections against a given IOU threshold and marking only the detections that exceed the IOU threshold as correct. Model detections that overlap ground truth by a percentage that is less than the IOU threshold are considered false positives.

The figure below compares the AP50 scores achieved by each model on the validation set throughout model training. Performance on the validation set was measured only every 100 iterations; these iterations are called model checkpoints.

The best checkpoints for each model are labeled and include 1% error bars. The dashed lines correspond to the lower bounds of these error bars, and the colored horizontal bands represent the regions where the model checkpoints are within 1% of the best model checkpoint.

The first observation from the above graph is how well each model learned. The AP50 for the ARD model was significantly higher than the image strip model throughout training, indicating that the ARD model learned from the training set better than the image strip model did.

The second point touches on how quickly each model learned. The ARD model achieved an AP50 that was within 1% of its best checkpoint at model iteration 2,500, less than halfway through its training cycle. In comparison, the image strip model’s AP50 didn’t increase to within 1% of its best checkpoint until right before the best checkpoint, iteration 4,600. Because a 1% difference in AP50 will not have a significant impact on model performance, the ARD model at iteration 2,500 was effectively as good as iteration 5,100. We can conclude that training on ARD enables us to train the model for fewer iterations, saving time and money.

Evaluating the model

After selecting the final models using the AP50 curves above, I visually reviewed the outputs of each model for the test set. The ARD model correctly detected more vehicles with fewer false detections than the image strip model.

*Left to Right: Ground truth, Model predictions: image strip, Model predictions: ARD*

In the above example, the ARD model (right) did a much better job than the image strip model (center) at finding all of the cars in this parking lot. You’ll notice it detected the partial cars along the top edge of the image better too.

The ARD model also better detected vehicles in imagery with more challenging conditions, such as high off-nadir angles and low sun elevations, which can produce shadows. The ARD model detected vehicles in shadows, partially obscured by trees and on the edge of an image.

While both models struggled to detect all the cars on shadowy streets, the ARD model (right) found four cars in the ground truth (left) that the image strip model (center) missed, including one along the left edge. The color balancing performed by ARD enables both the model and the human eye to detect cars easily missed by the image strip model.

To quantify the ARD model performance versus the image strip model, I compared the AP values calculated for the test set at a range of IOU thresholds between 25% and 75%.

Our analysis showed that the ARD model performed better than the image strip model at all thresholds, with the difference in performance increasing for greater IOU thresholds. The ARD model outperformed the image strip model by more than 12% at an IOU of 50% and had an AP50 of 0.78 compared to the image strip’s 0.69. This improvement in AP50 indicates that the ARD model not only did a better job locating vehicles but also determined their boundaries more accurately.

Conclusion

This study demonstrated that Maxar ARD increases the accuracy of an object detection model and enables users to speed up their pixel-to-answer workflows by requiring fewer iterations to train their models. The model trained on ARD achieved an AP50 score over test set that was more than 12% higher than the model trained on image strips. The ARD model also finished training in half the time. With its superior color balancing and the preprocessing steps already completed, ARD makes it faster and easier for customers to derive meaningful insights from Maxar's satellite imagery.

Maxar ARD with WorldView-3 and WorldView-4 imagery is now available. Learn more about Maxar ARD and download an ARD sample.

Prev Post Back to Blog Next Post

Using Analysis-Ready Data to Train More Accurate and Consistent Object Detection Models

Email Subscription

Related posts

WorldView Legion’s New Telescope Packs More Punch

First Two WorldView Legion Satellites Arrive at Launch Base

Maxar Intelligence Supporting EY’s Open Science Data Challenge