Project: Cook County Housing Analysis and Prediction

The project focused on analyzing and predicting housing prices in Cook County, Illinois, using advanced data analysis and machine learning techniques. These efforts aimed to explore key trends in housing prices, develop predictive models, and evaluate their fairness and accuracy.

1/10/20251 min read

Part 1: Exploring Cook County Housing

File (Google Drive):

PDF version:

Objective:
Explored housing data from Cook County, Illinois, to understand patterns in housing prices and prepare for predictive modeling.

Key Activities and Findings:

  • Exploratory Data Analysis (EDA): Visualized data distributions, identified outliers, and examined correlations between housing features and sale prices.

  • Feature Engineering: Created and transformed features such as log-transformed sale prices, neighborhood indicators, and property details extracted from descriptions.

  • Insights: Highlighted disparities in housing prices across neighborhoods and identified critical predictors like building size and property age.

Technologies Used:
Python (pandas, numpy), Visualization (Matplotlib, Seaborn)

Skills Demonstrated:

  • Data cleaning and preprocessing

  • Statistical analysis and visualization

  • Transforming raw data into actionable features

Part 2: Predicting Housing Prices in Cook County

File (Google Drive):

PDF Version:

Objective:
Developed a linear regression model to predict property sale prices and analyzed its fairness and accuracy in the context of housing equity.

Key Activities and Results:

  • Model Development: Built and evaluated regression models using engineered features such as building size, bedrooms, and neighborhood indicators.

  • Ethical Analysis: Explored the implications of property assessments on marginalized communities, addressing potential biases in the model.

  • Performance: Achieved a root mean squared error (RMSE) below key thresholds, ensuring robust predictions while balancing accuracy and fairness.

Technologies Used:

  • Python (pandas, numpy, scikit-learn)

  • Linear regression modeling and error analysis

Skills Demonstrated:

  • Advanced feature engineering

  • Model evaluation and error analysis

  • Incorporating social context and fairness into predictive modeling