Amihan
7 min read
Introduction
AMIHAN was a project made a year ago focusing on air quality monitoring in compliance with the requirements of our 2nd year course last year, CS158-1L: Artificial Intelligence Laboratory; where we created, academically, our ‘first’ machine learning model. My actual first model in practice was SmartGrid, based on neural nets.
AMIHAN started as a local monitoring machine learning model based on Random Forest Regression (RFR), to a global contest-level modeling based on Decision Trees (DT) for Altair Engineering (https://altair.com/).
Its first iteration is documented and recently published in Zenodo, an open-access repository backed by CERN (European Organization for Nuclear Research). They also provide DOI assignments, draft & metadata updating, which is neat!
The Altair Global Student Contest is an annual program hosted by Altair Engineering with a $2000 cash prize. It’s a data science focused contest that requires the participant to use Altair AI Studio and work on a data-driven project. This year, they wanted a focus on predictive modeling, AI solutions, visualizations, and optimization tools.
AMIHAN V.1

Title: AMIHAN: An Advanced Monitoring Intelligence for Harmful Air Navigation based on Mandaluyong, Philippines
DOI: https://doi.org/10.5281/zenodo.17827295
Contributors: Reyes, Christine. Ng, Louella. Esperanza, Kim.
Presentation: Canva, Public View Link
Tools: IPYNB, Python
Mandaluyong was selected to be the area of focus for our research because it was one of the cities that had a generous amount of air pollution due to urbanization, vehicle emissions, industrial activities, and so on; as mentioned in our Introduction section.
AMIHAN employs a Random Forest Regressor (RFR) to forecast daily concentrations of three key pollutants in Mandaluyong City: sulfur dioxide (SO₂), carbon monoxide (CO), and particulate matter (PM2.5).
Limitations:
Geographic scope: Focused only on Mandaluyong City; results may not generalize to other urban centers.
Pollutant selection narrowed: Limited to SO₂, CO, and PM2.5; excluded other pollutants and environmental variables (e.g., temperature, precipitation).
Temporal bias: Accuracy declines at higher pollutant concentrations, particularly PM2.5, suggesting unmodeled external factors.
Data dependency: Relies on HDX dataset validity; missing or incomplete records could affect robustness.
Model constraints: While RFR captures nonlinear trends, it may underperform compared to deep learning models (e.g., LSTM) for long-term temporal dependencies.
No vehicle-specific analysis: Future work aims to address emissions from jeepneys and other mobile sources, but V1 remains generalized.
AMIHAN V.2
Title: AMIHAN: Station-Level Air Quality Risk Profiling Based in Spain — Exploring the Impact of Urban Design and Traffic on Air Quality Risk (Prezi)
Abstract: AMIHAN V.2 helps correlate urban geometry, traffic intensity, separate emission values, and other dimensional factors with the risk levels of PM10 monitoring stations, categorizing them to each station with predicted risk classes (high-risk (1) vs low-risk (0)) for urban planning, public health improvements, as well as to help in prioritizing areas for early intervention and monitoring.
Contributors: Reyes, Christine Julliane L.
Context: Altair Global Student Contest, Category 2 (Their data, their use case).
Tools: Altair’s RapidMiner AI Studio
Process:
Turbo Prep (Inspect raw data → turned ‘?’ values to missing values (NA) → Check statistic results)

Process Design (Retrieve cleaned raw data → Filtered Spain, PM10, urban → Generate risk labels → Select final attributes for subset (geometric/structural data, etc.) → Store data subset → Generate results)

Auto Model (Auto-model subset → Based on results, save most optimum model (Decision Tree) and results → Generate process)

Results (Model, Predictions)



Comparative Insights: AMIHAN V.1 vs V.2
Scope and Context
V.1 (Philippines, Mandaluyong)
Focused on forecasting daily pollutant concentrations (SO2, CO, PM2.5) in a single urban area.
The goal was to support public health and policy planning through predictive modeling.
V.2 (Spain - Urban PM10 Stations)
Shifted to station-level risk profiling across 425 urban monitoring stations in Spain.
Generated station-level predictions (170 outputs) based on metadata attributes.
The goal was to identify structural risk drivers (traffic, street geometry, emissions) and predictive modeling across multiple stations in order to link urban design and traffic to PM10 risk.
From city-level pollutant forecasting → to multi-station predictive modeling with structural features.
Data and Inputs
V.1 (Philippines, Mandaluyong)
Historical pollutant concentrations (2003 to 2022).
Temporal features: year, month, week, lag, rolling mean.
V.2 (Spain - Urban PM10 Stations)
Metadata attributes (425 urban PM10 stations).
Predictive features: Main Emission Source, Longitude, Latitude, Street Width, Building Distance, Traffic Volume, Distance Source.
From time-series pollutant values → to spatial + structural metadata features.
Modeling Approach
V.1 (Philippines, Mandaluyong)
Random Forest Regressor with temporal feature engineering.
Predicted pollutant concentrations directly. Similar to General AQIs (MSN Weather).
Performance: R² ~0.85 (after retraining with temporal features).
V.2 (Spain - Urban PM10 Stations)
Decision Tree classifier/predictive model.
Generated station-level predictions (risk categories + metadata-driven outputs).
Performance: Accuracy ~70.2%, AUC ~0.77. Balanced with interpretability.
V.1 emphasized numerical pollutant forecasting, while V.2 emphasized station-level predictive profiling with interpretable rules.
Feature Importance
V.1 (Philippines, Mandaluyong)
Temporal features (lag, rolling mean).
Pollutant concentrations normalized and modeled simultaneously.
V.2 (Spain - Urban PM10 Stations)
Street Width (0.237) → strongest driver.
Traffic Volume (0.079), Distance Source (0.059), Main Emission Sources (0.051).
Spatial features (Longitude, Latitude, Building Distance, etc.) also shaped predictions.
V.1’s drivers were time and pollutant dynamics, while V.2’s were urban geometry, traffic, and spatial context.
Outputs and Impact
V.1 (Philippines, Mandaluyong)
Forecast pollutant levels for Mandaluyong.
Useful for local government, healthcare providers, and awareness campaigns.
V.2 (Spain - Urban PM10 Stations)
Predictive outputs for 170 stations.
Useful for urban planners and policymakers to prioritize interventions (traffic rerouting, street redesign; which stations are structurally high-risk, why, and how to intervene).
From community-level forecasting → to station-level predictive modeling for structural risk management.
Limitations
V.1 (Philippines, Mandaluyong)
Restricted to one city and three pollutants.
Accuracy declined at higher pollutant concentrations.
V.2 (Spain - Urban PM10 Stations)
Proxy risk labels (traffic vs non-traffic).
Limited to Spain’s PM10 urban subset.
Predictions based on metadata, not direct pollutant exceedances.
Summary
AMIHAN V1: A time-series forecasting model for pollutant concentrations in Mandaluyong.
AMIHAN V2: A station-level predictive model using metadata features to classify and profile risk across Spain’s urban PM10 stations.
Personal Notes
One of the things I really liked about Altair’s RapidMiner AI Studio is the fact that they have a Simulator! See

Here, I can check out how certain attributes that I included in the subset can affect the results. Example for this is the second graph (the bottom one). Adjusting ‘Main Emission Sources’ can help you check out what affects the rating of range1 (0) low-risk stations. Which is what is necessary. The nature of V.2 is more of prescriptive analytics. Come AI Studio’s optimization feature that automizes this by finding the ideal inputs for your ideal outcome based on the model.

All in all, AI Studio is amazing for data (pre)processing, modeling, and generating pipelines.
Future Direction
An interesting question was posed. Such that “Could AMIHAN evolve into a global framework for urban air quality risk profiling?“ — the answer is yes, but not in the rigid sense of a finished product. AMIHAN’s strength lies in its adaptability: Version 1 demonstrated how time-series forecasting could reveal pollutant trends in a single city, while Version 2 showed how metadata-driven modeling could profile structural risks across hundreds of stations. Together, these iterations suggest a way towards a modular system that can be applied in different geographies, datasets, and contexts. Each city, each dataset, becomes another way for AMIHAN to evolve, building towards a more comprehensive framework for understanding how urban design, traffic, and emissions shape air quality risk.
AMIHAN really began as an academic project, but developing it for Altair made it into an actual system; which is something I really value. What can you do outside of the classroom? Because ultimately, we’ll be out of it sooner or later.
On a more personal note, I joined the Altair contest mostly for fun, if not last minute. Thank you, Ms. Renilda Layno for referring me to join this contest! It gave me a chance to revisit and optimize an old project and make my own playful reworks that worked for a larger narrative.
Explore
And my Zenodo entry really made me feel like a researcher so I really like that too. The concept of getting DOI labels is actually pretty exciting. I hope to publish to bigger journals soon! (https://doi.org/10.5281/zenodo.17827295).
Stay tuned if I win or subscribe to my newsletter for any future updates. :)

