House Prices Prediction

See this project’s code on GitHub: Regressions - Random Forests

Abstract

Background: Understanding factors influencing house prices is crucial in real estate markets. This study aims to identify key determinants affecting house selling prices.

Objectives: Investigate the significance of lot area, living space, house quality, construction year, basement exposure, heating and air conditioning types, fireplaces, garage capacity, wooden floors, and porch size on house prices.

Methods: Employed EDA and machine learning algorithms on housing dataset. Developed regression models to quantify relationships between variables and prices.

Results: Found lot area, living space, quality, construction year, basement exposure, heating, air conditioning, fireplaces, garage capacity, wooden floors, and porch size as significant factors, explaining 87% to 89.6% of price variability.

Conclusion: Insights aid real estate stakeholders in making informed decisions regarding property investments and transactions.

Data

The data used in this research paper is sourced from the "House Prices - Advanced Regression Techniques" dataset, which is available on Kaggle. The dataset can be accessed through the following link: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data . This dataset contains a comprehensive collection of housing features, including lot area, living space, quality, construction year, basement exposure, heating and air conditioning types, number of fireplaces, garage capacity, wooden floor area, porch size, and other relevant variables.

Methodology & Results

Result 1: Initially, missing values were removed from the dataset to ensure data integrity. Following this, Ordinary Least Squares (OLS) regression was applied to eliminate any variables that were not statistically significant at a 95% confidence level (p-value < 0.050). Heteroskedasticity was detected in the data, prompting the use of Weighted Least Squares (WLS) and Heteroskedasticity-Robust Standard Errors Regression to address this issue (Breusch-Pagan test statistic = 0.561276, p-value = 0.0000).

Result 2: WLS and Heteroskedasticity-Robust Standard Errors Regression techniques were implemented, revealing significance for lot area and the total living space, the quality and condition of the house, the construction year, the exposure of the basement, the heating type and air conditioning, the number of fireplaces, the number of cars fitted in the garage, the surface of wooden floors and the surface of porch. The analytical results are summarized in Table 1 and depicted graphically in Graph 1, showcasing the relationships between the significant variables and house prices.

Table 1. Regressions Results

Notes: *, ** and *** indicate statistical significance of 90%, 95% and 99%, respectively.

Graph 1. Regressions Results

Result 3: Categorical variables underwent cleaning procedures. Utilizing a random forest algorithm, the top qualitative features impacting house prices were identified as external quality and garage location, with significance levels of 34% and 3.8%, respectively. The Mean Squared Error (MSE) of the Random Forest model was calculated to be 0.0388, indicating strong predictive performance. These findings are visually presented in Graph 2, highlighting the influence of qualitative features on house prices.

Graph 2. Random Forest Best Features

Conclusion

This study identifies significant factors affecting house prices through thorough data analysis and regression modeling. Utilizing techniques such as Weighted Least Squares and random forest algorithms, key determinants like lot area, living space, house quality, and others are highlighted. These findings offer valuable insights for real estate stakeholders, aiding in informed decision-making for property transactions. This research contributes to a deeper understanding of house price dynamics, emphasizing the importance of both quantitative and qualitative features in determining property values.

Previous
Previous

Airline Passenger Satisfaction Determinants

Next
Next

Retail Banking Customer Default Probability