This is a ML-based project which tracks a users details and predicts what will be the most preferred destination of the user. It takes data like age, gender, date when the account is created and where they saw the marketing and predicts what will be the preferred destination of the user.
The objectives of this project:
- See how user behaviour predicts their preferred destination
- Learning how to make end-to-end web based ML project
- Although the project is still incomplete, we intend to have live prediction for any user entering their data.
- We also aspire to add some features to make sure the prediction seem more accurate to user.
- The data is taken from the competition organised by Airbnb on kaggle, which as data regarding various users and their activities on the Airbnb website and apps, using which we have to track which country is their preferred destination.
- The data used assumes that all the users are from the United States of America.
- The countries among the preferred destinations are 'US', 'FR', 'CA', 'GB', 'ES', 'IT', 'PT', 'NL','DE', 'AU', 'NDF' (no destination found), and 'other'.
- Please note that 'NDF' is different from 'other' because 'other' means there was a booking, but is to a country not included in the list, while 'NDF' means there wasn't a booking.
- train_users.csv - the training set of users
- test_users.csv - the test set of users:
- id: user id
- date_account_created: the date of account creation
- timestamp_first_active: timestamp of the first activity, note that it can be earlier than date_account_created or date_first_booking because a user can search before signing up
- date_first_booking: date of first booking
- gender
- age
- signup_method
- signup_flow: the page a user came to signup up from
- language: international language preference
- affiliate_channel: what kind of paid marketing
- affiliate_provider: where the marketing is e.g. google, craigslist, other
- first_affiliate_tracked: whats the first marketing the user interacted with before the signing up
- signup_app
- first_device_type
- first_browser
- country_destination: this is the target variable you are to predict
- sessions.csv - web sessions log for users
- user_id: to be joined with the column 'id' in users table
- action
- action_type
- action_detail
- device_type
- secs_elapsed
- countries.csv - summary statistics of destination countries in this dataset and their locations
- age_gender_bkts.csv - summary statistics of users' age group, gender, country of destination
- sample_submission.csv - correct format for submitting your predictions
The data can be found at this link.
The libraries and frameworks used in this project till now are:
- Pandas - Used for data analysis
- NumPy - Awesome mathematical library for python
- Matplotlib - For data visualization.
- Scikit-learn - For implementing ML algorithms.
- Xg-boost - For implementing boosted algorithms.
- Jupyter - For making python notebook.
- Flask - For making web application.
In future we will add more features, hence other tech used in the project will be also mentioned.
- Improve the model.
- Create an interactive UI and build a real-time prediction mechanism.
- 1.0 - Only some basic files, will add more files and attach UI images soon in future.
MIT
Free Software, Hell Yeah!