Predicting Flight Arrival Delays: Machine Learning Approaches

Project Proposal

Georgia Institute of Technology
CS4641 - Machine Learning

*Indicates Equal Contribution

Introduction

Air travel is vital to global transportation, but flight delays remain a persistent challenge. They disrupt schedules, raise costs, reduce passenger satisfaction, and add to fuel use and emissions. Predicting delays has become a key research area, with machine learning offering promising solutions.

Related Work

Machine learning has been widely applied to flight delay prediction. Studies show models such as logistic regression, random forests, neural networks, and gradient boosting can achieve strong accuracy across different datasets [1-4]. Reviews also highlight the importance of key input features (e.g., flight attributes, weather) and recommend neural networks and ensemble methods as the most effective approaches [5].

Dataset

For this project, we use flight performance data from the U.S. Bureau of Transportation Statistics (BTS), specifically arrivals at Hartsfield–Jackson Atlanta International Airport (ATL). The dataset covers domestic flights over the past five years (2019-2025, excluding 2020 and 2021 due to COVID-19 disruptions). It contains detailed records such as scheduled and actual times, delay, cause of delay, etc.

Dataset link: BTS On-Time Arrival Data [6].

Problem Definition

According to the Bureau of Transportation Statistics (BTS), in 2024 there were 787,630 arrival delays across the United States [7], underscoring the scope and recurrence of this issue. Machine learning has been used for several years as a tool to address this issue, and it has been proven to be a powerful and effective one. To standardize our analysis, we adopt the Federal Aviation Administration (FAA) definition of a delay, which considers a flight delayed if its arrival time is more than 15 minutes later than scheduled. This definition provides a consistent framework for modeling and evaluating flight delays.

Methods

For our project on predicting flight delays, we will use a mix of preprocessing steps and supervised machine learning models. The goal is to clean and prepare the data so that our models can make reliable predictions.

Data Preprocessing

First, we will check which features are actually useful. Identifiers like flight number or carrier code might introduce bias, so we may drop them. Since flight patterns during COVID (2020–2021) were an exception, we plan to exclude those years from training. Missing values will be handled by either removing the rows or filling them in when possible (for example, computing delays directly from timestamps).

To make categorical features such as airline or airport usable, we will use scikit-learn’s OneHotEncoder. Continuous variables like flight distance may be scaled with either StandardScaler or MinMaxScaler so that no single feature dominates.

Another important step is dealing with class imbalance: only about 21.9% of flights are delayed. To avoid training a model that just predicts “on time,” we will apply SMOTE (imblearn.over_sampling.SMOTE) to oversample the minority class. We also plan to only keep features that are known before departure, so the model stays realistic.

Machine Learning Models

We will treat this as a supervised classification problem. As baselines, we will test Logistic Regression (LogisticRegression) and Naive Bayes (GaussianNB), which are simple and interpretable.

For more flexible models, we will try Decision Trees and Random Forests (RandomForestClassifier) and might also consider using Support Vector Machines (SVC). Finally, we will train a small Neural Network (MLPClassifier) to see if it captures non-linear relationships better.

These models are all part of our course syllabus and have also been used in prior work on flight delay prediction. If time allows, we may also experiment with boosted tree models or hyperparameter tuning.

This setup should give us a good balance between simpler interpretable models and more powerful methods.

(Potential) Results and Discussion

Evaluation Metrics

We evaluate classification performance using standard metrics that account for overall correctness, the quality of positive (delayed) predictions, and sensitivity to class imbalance.

Metric Purpose Formula / Note
Accuracy Overall fraction of correct predictions \(\frac{TP + TN}{TP + TN + FP + FN}\)
Precision Reliability of positive (delayed) predictions \(\frac{TP}{TP + FP}\)
Recall (Sensitivity) Ability to find actual delayed flights \(\frac{TP}{TP + FN}\)
F1-Score Harmonic mean of precision and recall (balance) \(2\times\frac{\mathrm{Precision}\times\mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}\)
AUC–ROC Threshold-independent measure of separability Area under ROC curve (0.5 = random, 1.0 = perfect)
Balanced Error Rate (BER) Accounts for class imbalance by averaging class-wise error \(0.5\times\left(\frac{FN}{TP + FN} + \frac{FP}{TN + FP}\right)\)

We'll report per-class precision/recall and the confusion matrix, include AUC–ROC for threshold-independent evaluation, and prioritize F1/Recall when appropriate. All metrics will be computed on chronologically held-out test sets to reflect real-world forecasting.

Sustainability Goal & Ethical Considerations

Predicting delays can improve operational efficiency (e.g., gate assignments, taxi times, crew scheduling) and reduce fuel waste and emissions. We will also follow ethical AI practices: anonymize data, comply with privacy rules, use interpretable explanations (e.g., SHAP), report uncertainty, and audit models for bias to ensure our solutions are both environmentally and socially responsible.

Expected Results

Given our Methods and prior work [1-5], we expect ensembles (e.g., gradient boosting, random forests) to perform best, with simpler models providing interpretable baselines and improved recall/F1 after balancing.

References

[1] I. Hatipoğlu and Ö Tosun, “Predictive modeling of flight delays at an airport using machine learning methods,” Applied Sciences, vol. 14, no. 13, p. 5472, 2024.

[2] N. Kuhn and N. Jamadagni, Application of machine learning algorithms to predict flight arrival delays. Course Project Report CS229, Stanford University, 2024. [Online]. Available: https://cs229.stanford.edu/proj2017/final-reports/5243248.pdf

[3] P. Meel, M. Singhal, M. Tanwar and N. Saini, "Predicting flight delays with error calculation using machine learned classifiers,"  7th International Conference on Signal Processing and Integrated Networks (SPIN)*, 2020, pp. 71-76.

[4] J. Li et al., "Prediction of flight arrival delay time using U.S. Bureau of Transportation Statistics," 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 2 023, pp. 603-608.

[5] T. K. Huynh, T. Cheung and C. Chua, "A systematic review of flight delay forecasting models," 2024 7th International Conference on Green Technology and Sustainable Development (GTSD), 2024, pp. 533-540.

[6] Bureau of Transportation Statistics, U.S. Department of Transportation, 2025. “On-Time Performance Data (Arrivals),” distributed by Bureau of Transportation Statistics. Available: https://www.transtats.bts.gov/ONTIME/Arrivals.aspx

[7] Bureau of Transportation Statistics, U.S. Department of Transportation, 2024. “Number of Delayed Flights in 2024,” distributed by Bureau of Transportation Statistics. Available: https://www.transtats.bts.gov/Marketing_Annual.aspx?heY_fryrp6lrn4=FDFI&heY_fryrp6Z106u=J&heY_gvzr=E&heY_fryrp6v10=E

Proposal Presentation Video