Data Engineering & QA
Our pipelines, validations, and governance for PHMSA data.
Pipelines and sources
- Incidents: gas distribution, gas transmission & gathering, hazardous liquids & CO₂
- Annual reports: exposure denominators (miles, services), integrity coverage
- Enforcement actions: cases, penalties, lifecycle dates
- Reference: geography (FIPS), population density, climate regions, CPI
QA & validation
- Schema checks and required field presence
- Row count and range validations by program/year
- Key integrity: operator ids/names, dates, cause categories
- Versioned snapshots with manifest and as-of dates
Refresh cadence & ownership
- Incidents/enforcement: monthly refresh; SLA: 48h from source update
- Annual reports: yearly update upon PHMSA publication
- Data Engineering owns ingestion and QA; SME review before roll-out
Reproducibility & governance
- Scripted ingests with `pipenv run` commands
- Manifests with SHA256, content-type, file sizes
- Data dictionary and export labeling (public vs derived)
PHMSA pipeline safety insights.
Product
Features
Company
All rights reserved. Copyright © by ClearPHMSA