Predicting Flight Arrival Delays: Machine Learning Approaches
Project Proposal
Introduction
Air travel is vital to global transportation, but flight delays remain a persistent challenge. They disrupt schedules, raise costs, reduce passenger satisfaction, and add to fuel use and emissions. Predicting delays has become a key research area, with machine learning offering promising solutions.
Related Work
Machine learning has been widely applied to flight delay prediction. Studies show models such as logistic regression, random forests, neural networks, and gradient boosting can achieve strong accuracy across different datasets [1-4]. Reviews also highlight the importance of key input features (e.g., flight attributes, weather) and recommend neural networks and ensemble methods as the most effective approaches [5].
Dataset
For this project, we use flight performance data from the U.S. Bureau of Transportation Statistics (BTS),
specifically arrivals at Hartsfield–Jackson Atlanta International Airport (ATL).
The dataset covers domestic flights over the past five years (2019-2025, excluding 2020 and 2021 due to COVID-19 disruptions).
It contains detailed records such as scheduled and actual times, delay, cause of delay, etc.
Dataset link: BTS On-Time Arrival Data [6].
Problem Definition
According to the Bureau of Transportation Statistics (BTS), in 2024 there were 787,630 arrival delays across the United States [7], underscoring the scope and recurrence of this issue. Machine learning has been used for several years as a tool to address this issue, and it has been proven to be a powerful and effective one. To standardize our analysis, we adopt the Federal Aviation Administration (FAA) definition of a delay, which considers a flight delayed if its arrival time is more than 15 minutes later than scheduled. This definition provides a consistent framework for modeling and evaluating flight delays.
Methods
For our project on predicting flight delays, we will use a mix of preprocessing steps and supervised machine learning models. The goal is to clean and prepare the data so that our models can make reliable predictions.
Data Preprocessing
First, we will check which features are actually useful. Identifiers like flight number or carrier code might introduce bias, so we may drop them. Since flight patterns during COVID (2020–2021) were an exception, we plan to exclude those years from training. Missing values will be handled by either removing the rows or filling them in when possible (for example, computing delays directly from timestamps).
To make categorical features such as airline or airport usable, we will use scikit-learn’s OneHotEncoder. Continuous variables like flight distance may be scaled with either StandardScaler or MinMaxScaler so that no single feature dominates.
Another important step is dealing with class imbalance: only about 21.9% of flights are delayed. To avoid training a model that just predicts “on time,” we will apply SMOTE (imblearn.over_sampling.SMOTE) to oversample the minority class. We also plan to only keep features that are known before departure, so the model stays realistic.
Machine Learning Models
We will treat this as a supervised classification problem. As baselines, we will test Logistic Regression (LogisticRegression) and Naive Bayes (GaussianNB), which are simple and interpretable.
For more flexible models, we will try Decision Trees and Random Forests (RandomForestClassifier) and might also consider using Support Vector Machines (SVC). Finally, we will train a small Neural Network (MLPClassifier) to see if it captures non-linear relationships better.
These models are all part of our course syllabus and have also been used in prior work on flight delay prediction. If time allows, we may also experiment with boosted tree models or hyperparameter tuning.
This setup should give us a good balance between simpler interpretable models and more powerful methods.
(Potential) Results and Discussion
Evaluation Metrics
We evaluate classification performance using standard metrics that account for overall correctness, the quality of positive (delayed) predictions, and sensitivity to class imbalance.
| Metric | Purpose | Formula / Note |
|---|---|---|
| Accuracy | Overall fraction of correct predictions | \(\frac{TP + TN}{TP + TN + FP + FN}\) |
| Precision | Reliability of positive (delayed) predictions | \(\frac{TP}{TP + FP}\) |
| Recall (Sensitivity) | Ability to find actual delayed flights | \(\frac{TP}{TP + FN}\) |
| F1-Score | Harmonic mean of precision and recall (balance) | \(2\times\frac{\mathrm{Precision}\times\mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}\) |
| AUC–ROC | Threshold-independent measure of separability | Area under ROC curve (0.5 = random, 1.0 = perfect) |
| Balanced Error Rate (BER) | Accounts for class imbalance by averaging class-wise error | \(0.5\times\left(\frac{FN}{TP + FN} + \frac{FP}{TN + FP}\right)\) |
We'll report per-class precision/recall and the confusion matrix, include AUC–ROC for threshold-independent evaluation, and prioritize F1/Recall when appropriate. All metrics will be computed on chronologically held-out test sets to reflect real-world forecasting.
Sustainability Goal & Ethical Considerations
Predicting delays can improve operational efficiency (e.g., gate assignments, taxi times, crew scheduling) and reduce fuel waste and emissions. We will also follow ethical AI practices: anonymize data, comply with privacy rules, use interpretable explanations (e.g., SHAP), report uncertainty, and audit models for bias to ensure our solutions are both environmentally and socially responsible.
Expected Results
Given our Methods and prior work [1-5], we expect ensembles (e.g., gradient boosting, random forests) to perform best, with simpler models providing interpretable baselines and improved recall/F1 after balancing.
References
[1] I. Hatipoğlu and Ö Tosun, “Predictive modeling of flight delays at an airport using machine learning methods,” Applied Sciences, vol. 14, no. 13, p. 5472, 2024.
[2] N. Kuhn and N. Jamadagni, Application of machine learning algorithms to predict flight arrival delays. Course Project Report CS229, Stanford University, 2024. [Online]. Available: https://cs229.stanford.edu/proj2017/final-reports/5243248.pdf
[3] P. Meel, M. Singhal, M. Tanwar and N. Saini, "Predicting flight delays with error calculation using machine learned classifiers," 7th International Conference on Signal Processing and Integrated Networks (SPIN)*, 2020, pp. 71-76.
[4] J. Li et al., "Prediction of flight arrival delay time using U.S. Bureau of Transportation Statistics," 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 2 023, pp. 603-608.
[5] T. K. Huynh, T. Cheung and C. Chua, "A systematic review of flight delay forecasting models," 2024 7th International Conference on Green Technology and Sustainable Development (GTSD), 2024, pp. 533-540.
[6] Bureau of Transportation Statistics, U.S. Department of Transportation, 2025. “On-Time Performance Data (Arrivals),” distributed by Bureau of Transportation Statistics. Available: https://www.transtats.bts.gov/ONTIME/Arrivals.aspx
[7] Bureau of Transportation Statistics, U.S. Department of Transportation, 2024. “Number of Delayed Flights in 2024,” distributed by Bureau of Transportation Statistics. Available: https://www.transtats.bts.gov/Marketing_Annual.aspx?heY_fryrp6lrn4=FDFI&heY_fryrp6Z106u=J&heY_gvzr=E&heY_fryrp6v10=E