Articles producció científicaEnginyeria Mecànica

Using open data to derive parsimonious data-driven models for uncovering the influence of local traffic and meteorology on air quality: The case of Madrid

  • Dades identificatives

    Identificador:  imarina:9463523
    Autors:  Kazemi, K; Vernet, A; Fabregat, A
    Resum:
    Air pollution remains a critical public health and environmental challenge, particularly in urban areas where traffic emissions and meteorological conditions strongly influence air quality. While Machine Learning (ML) techniques have been increasingly used to model pollutant concentrations, many existing studies rely on complex architectures that often integrate multiple heterogeneous data sources. In contrast, this study presents a parsimonious, data-driven ML model that predicts local hourly concentrations of key pollutants-NO2, O3, PM2.5, and PM10-in Madrid using only open data sources. A key factor of our approach is the incorporation of hourly road traffic data collected in the immediate vicinity of each pollutant monitoring station as a predictor. This localized traffic information, combined with local meteorological data, allows our model to outperform other existing solutions that often depend on historical and/or proprietary data. Our results clearly demonstrate that better data might surpass the benefits of more complex ML architectures. The model achieves strong predictive accuracy, with test R2 scores ranging from 0.77 to 0.86 for NO2, 0.8 to 0.85 for O3, 0.63 to 0.82 for PM2.5, and 0.68 to 0.95 for PM10. This remarkable performance underscores the utility of dense networks of vehicle count sensors providing high-resolution spatiotemporal traffic data as a critical input for accurate urban air quality modeling. Additionally, we conducted a sensitivity analysis to assess the impact of reducing vehicle emissions on local NO2 levels, offering actionable insights for policymakers. The findings highlight the potential of open-data-driven models in urban air quality management, providing a scalable, cost-effective, and interpretable tool to support evidence-based
  • Altres:

    Autor segons l'article: Kazemi, K; Vernet, A; Fabregat, A
    Departament: Enginyeria Mecànica
    Autor/s de la URV: Fabregat Tomàs, Alexandre / Vernet Peña, Antonio
    Paraules clau: Air pollution; Europ; Machine learning; Mortality; Pollution; Road traffic emissions; Statistical-models; Time-series; Urban air qualit; Urban air quality
    Resum: Air pollution remains a critical public health and environmental challenge, particularly in urban areas where traffic emissions and meteorological conditions strongly influence air quality. While Machine Learning (ML) techniques have been increasingly used to model pollutant concentrations, many existing studies rely on complex architectures that often integrate multiple heterogeneous data sources. In contrast, this study presents a parsimonious, data-driven ML model that predicts local hourly concentrations of key pollutants-NO2, O3, PM2.5, and PM10-in Madrid using only open data sources. A key factor of our approach is the incorporation of hourly road traffic data collected in the immediate vicinity of each pollutant monitoring station as a predictor. This localized traffic information, combined with local meteorological data, allows our model to outperform other existing solutions that often depend on historical and/or proprietary data. Our results clearly demonstrate that better data might surpass the benefits of more complex ML architectures. The model achieves strong predictive accuracy, with test R2 scores ranging from 0.77 to 0.86 for NO2, 0.8 to 0.85 for O3, 0.63 to 0.82 for PM2.5, and 0.68 to 0.95 for PM10. This remarkable performance underscores the utility of dense networks of vehicle count sensors providing high-resolution spatiotemporal traffic data as a critical input for accurate urban air quality modeling. Additionally, we conducted a sensitivity analysis to assess the impact of reducing vehicle emissions on local NO2 levels, offering actionable insights for policymakers. The findings highlight the potential of open-data-driven models in urban air quality management, providing a scalable, cost-effective, and interpretable tool to support evidence-based decision-making and environmental policy design.
    Àrees temàtiques: Arquitetura e urbanismo; Biodiversidade; Biotecnología; Ciência de alimentos; Ciências agrárias i; Ciências ambientais; Ciências biológicas i; Ciências biológicas ii; Ciências biológicas iii; Engenharias i; Engenharias ii; Engenharias iii; Ensino; Environmental sciences; Farmacia; General medicine; Geociências; Geografía; Health, toxicology and mutagenesis; Interdisciplinar; Matemática / probabilidade e estatística; Medicina i; Medicina ii; Medicine (miscellaneous); Nutrição; Pollution; Química; Saúde coletiva; Toxicology; Zootecnia / recursos pesqueiros
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: anton.vernet@urv.cat; alexandre.fabregat@urv.cat
    Data d'alta del registre: 2026-02-13
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0269749125010644?via%3Dihub
    Referència a l'article segons font original: Environmental Pollution. 383 126691-
    Referència de l'ítem segons les normes APA: Kazemi, K; Vernet, A; Fabregat, A (2025). Using open data to derive parsimonious data-driven models for uncovering the influence of local traffic and meteorology on air quality: The case of Madrid. Environmental Pollution, 383(), 126691-. DOI: 10.1016/j.envpol.2025.126691
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1016/j.envpol.2025.126691
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2025-10-15
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Environmental Sciences,Health, Toxicology and Mutagenesis,Medicine (Miscellaneous),Pollution,Toxicology
    Air pollution
    Europ
    Machine learning
    Mortality
    Pollution
    Road traffic emissions
    Statistical-models
    Time-series
    Urban air qualit
    Urban air quality
    Arquitetura e urbanismo
    Biodiversidade
    Biotecnología
    Ciência de alimentos
    Ciências agrárias i
    Ciências ambientais
    Ciências biológicas i
    Ciências biológicas ii
    Ciências biológicas iii
    Engenharias i
    Engenharias ii
    Engenharias iii
    Ensino
    Environmental sciences
    Farmacia
    General medicine
    Geociências
    Geografía
    Health, toxicology and mutagenesis
    Interdisciplinar
    Matemática / probabilidade e estatística
    Medicina i
    Medicina ii
    Medicine (miscellaneous)
    Nutrição
    Pollution
    Química
    Saúde coletiva
    Toxicology
    Zootecnia / recursos pesqueiros
  • Documents:

  • Cerca a google

    Search to google scholar