Airline Passenger Satisfaction Determinants

See this project’s code on Github: Basic Method - Bootstrapping - Undersampling

Abstract

Background: Understanding the factors influencing airline passenger satisfaction is essential for the aviation industry to enhance customer experience and loyalty. This research aims to identify key determinants of passenger satisfaction using exploratory data analysis (EDA) and machine learning algorithms.

Objectives: Investigate the impact of various factors including online boarding, in-flight services, flight distance, legroom service, age, ease of online booking, seat comfort, departure and arrival time convenience, baggage handling, gate location, and cleanliness on airline passenger satisfaction.

Methods: Employed EDA techniques and machine learning algorithms, including random forest, probit, and naive Bayes models. Applied undersampling and bootstrapping methods to address sample imbalance. Evaluated model performance using ROC/AUC curves and confusion matrices.

Results: Identified online boarding, in-flight services, flight distance, legroom service, age, ease of online booking, seat comfort, departure and arrival time convenience, baggage handling, gate location, and cleanliness as significant determinants of passenger satisfaction. Developed models achieved high accuracy ranging from 88% to 99%.

Conclusion: This research provides valuable insights into factors driving airline passenger satisfaction, offering actionable information for airlines to improve customer experience and loyalty. The robustness of the models and high accuracy rates underscore the effectiveness of machine learning approaches in analyzing passenger satisfaction data.

Data

The data utilized in this research paper originates from the "Airline Passenger Satisfaction" dataset, which is available on Kaggle. Access to the dataset can be obtained through the following link: https://www.kaggle.com/datasets/mysarahmadbhat/airline-passenger-satisfaction . This dataset encompasses a diverse range of features related to airline passenger experiences, including factors such as online boarding, in-flight services, flight distance, legroom service, age, ease of online booking, seat comfort, departure and arrival time convenience, baggage handling, gate location, and cleanliness. The dataset serves as a valuable resource for analyzing passenger satisfaction levels within the aviation industry.

Methodology & Results

Result 1: Initially, missing values were removed from the dataset to ensure data integrity. Subsequently, correlation analysis was conducted to identify and eliminate highly correlated features, reducing redundancy and potential multicollinearity in the dataset. The highest correlation is observed between departure and arrival delay and through a regression analysis it is observed that arrival delay is mostly caused due to departure delay. Therefore, only arrival delay is kept in the dataset.

 

See full results in the Appendix 1.

 

Result 2: The random forest algorithm was employed to analyze passenger satisfaction, utilizing three variations: the entire dataset, bootstrapping, and undersampling techniques. The results consistently highlighted online boarding, in-flight services, flight distance, legroom service, age, ease of online booking, seat comfort, departure and arrival time convenience, baggage handling, gate location, and cleanliness as crucial features influencing passenger satisfaction.

 

Result 3: A Probit model was developed to further explore the determinants of passenger satisfaction, with robustness checks performed using bootstrapping and undersampling methods (see graphs 2 – 7). The results demonstrated consistency across all methodologies, reaffirming the significance of the identified features in predicting passenger satisfaction levels.

 

See full results in the Appendix 2.

Result 4: Additionally, a naive Bayes model was utilized to assess the robustness of the findings, incorporating bootstrapping and undersampling techniques for validation (see graphs 8 – 13). The results remained robust across different methodologies, further validating the importance of the identified features in determining passenger satisfaction levels.

 

See full results in the Appendix 3.

Conclusion

In conclusion, this research sheds light on the multifaceted determinants of airline passenger satisfaction through a comprehensive analysis utilizing exploratory data techniques and machine learning algorithms. By systematically cleaning the data and employing advanced modeling techniques such as random forest, Probit, and naive Bayes, key features impacting passenger satisfaction were identified and validated. The robustness of the findings across different sampling methods underscores the reliability of the results. These insights offer valuable guidance for airline operators in prioritizing areas for improvement to enhance customer experience and foster loyalty. Moving forward, continued research in this area is crucial for staying attuned to evolving passenger preferences and maintaining competitiveness in the dynamic aviation industry landscape.

Appendix 1

Graph 1. Correlation Matrix

Appendix 2

Graph 2. Simple Probit ROC/AUC

Graph 3. Simple Probit Confusion Matrix

Graph 4. Bootstrap Method Probit ROC/AUC

Graph 5. Bootstrap Method Probit Confusion Matrix

Graph 6. Undersampling Method Probit ROC/AUC

Graph 7. Undersampling Method Probit Confusion Matrix

Appendix 3

Graph 8. Simple Naive Bayes ROC/AUC

Graph 9. Simple Naïve Bayes Confusion Matrix

Graph 10. Bootstrap Method Naive Bayes ROC/AUC

Graph 11. Bootstrap Method Naïve Bayes Confusion Matrix

Graph 12. Undersampling Method Naive Bayes ROC/AUC

Graph 13. Undersampling Method Naïve Bayes Confusion Matrix

Previous
Previous

Published Research

Next
Next

House Prices Prediction