by Austin Poor
I did this analysis as my second project for the Metis Data Science Bootcamp. For this project we chose our own topics but were required to use gather our data by web-scraping and use a linear regression model.
As someone who spent most of my life living in New York, I picked a topic near and dear to my heart – I chose to model New York City apartment rental prices, using data scraped from Craigslist.
Data was collected from NYC area Craigslist (newyork.craigslist.com) with listings that were posted in the range 2019-12-24
to 2020-01-23
.
I used two python scripts to scrape and clean my data – scrape.py and clean.py – which download apartment listing data to an sqlite database data/craigslist_apts.db
.
From there, the notebook craigslist_regression.ipynb loads the data, further cleans it, and then models the data. There's an additional notebook, geometry_conversion.ipynb, which is used to calculate apartment neighborhoods based on the latitude and longitude data from the Craigslist apartment listing.
After testing multiple types of linear models (linear regression, degree-2 polynomial regression, degree-3 polynomial regression, LASSO, and Ridge), my final model (degree 3 polynomial regression) was able to get an R^2 score of 0.768
on test data.
I've included a pdf of the slide deck used for my presentation, here.