Extracting accurate commodities export data

Extracting accurate commodities export data

See how sieve address data quality issues that previously led to production breakages/stoppages for a group of commodities-focused investors

Context

The Data Engineer and Portfolio Manager duo we worked with focused on trading commodities. They wanted up-to-date export information that wasn't available on the market. The delay on vendors' feeds was too much for the team to tolerate - so they planned to collect the data themselves.

Issues

Real-time data collection was a pain. Their scripts were automatically trawling the web to find new export data published by various countries. Inconsistencies in formatting and content often broke the team's parser. A breakage in production resulted in a 5:30 am pagerduty notification to the Data Engineer, Tom, who would implement a quick fix. The 5:30AM quick fixes always came down to Tom finding the source documents, translating and reading them himself, and manually backfilling the correct values into the database. Actual fixes to the parser took way longer and couldn't be done in real time.

sieve solution

We built an API endpoint for Tom to fetch human-validated data when his automated parsing failed. The API endpoint takes information about the relevant source and the desired data points, and can be called in the very same code paths that would otherwise escalate errors for human review.

2024 Sieve Data Inc. All Rights Reserved.