Skip to content

Latest commit

 

History

History
179 lines (130 loc) · 19.2 KB

README.md

File metadata and controls

179 lines (130 loc) · 19.2 KB

Preppin' Data with Python 🐍, R 🏴‍☠️, SQL ❄️, Alteryx ⬆️, and Tableau Prep ✨

Solving Preppin' Data Challenges with Python, and now Alteryx and Tableau Prep.

Status GitHub Issues GitHub Pull Requests License

Twitter 💬   |   LinkedIn 👔   |   GitHub :octocat:   |   Website 🔗

🅰️ About

At the start of 2021 I wanted to improve my data prep skills with Python. Now in 2022 I'm doing the same for Alteryx and Tableau Prep.

Preppin' Data is a weekly data prep challenge built around Tableau Prep but the challenges apply to other tools and even coding languages, so ideal practice if you're looking to improve your data prep skills.

From participating, I'm much more confident in Python and use Python in projects such as working out the life expectancy of chess pieces and finding the resale value of Pokemon trading cards.

Below are my solutions and Python code snippets I regularly use in these challenges.

✅ 2023 Solutions

Challenge Python R SQL
Week 01 🐍 🏴‍☠️ ❄️
Week 02 🐍 🏴‍☠️ ❄️
Week 03 🐍 🏴‍☠️ ❄️
Week 04 🐍 🏴‍☠️ ❄️
Week 05 ❄️
Week 06 ❄️
Week 07 ❄️
Week 08 ❄️
Week 09 ❄️
Week 10 ❄️
Week 11 ❄️
Week 12 ❄️

✅ 2022 Solutions

Challenge Python Alteryx Tableau Prep
Week 01 🐍 ⬆️
Week 02 🐍 ⬆️
Week 03 🐍 ⬆️
Week 04 🐍 ⬆️

✅ 2021 Solutions

Challenge Solution Challenge Solution Challenge Solution Challenge Solution
Week 01 Python Week 14 Python Week 27 Python Week 40 Python
Week 02 Python Week 15 Python Week 28 Python Week 41 Python
Week 03 Python Week 16 Python Week 29 Python Week 42 Python
Week 04 Python Week 17 Python Week 30 Python Week 43 Python
Week 05 Python Week 18 Python Week 31 Python Week 44 Python
Week 06 Python Week 19 Python Week 32 Python Week 45 Python
Week 07 Python Week 20 Python Week 33 Python Week 46 Python
Week 08 Python Week 21 Python Week 34 Python Week 47 Python
Week 09 Python Week 22 Python Week 35 Python Week 48 Python
Week 10 Python Week 23 Python Week 36 Python Week 49 Python
Week 11 Python Week 24 Python Week 37 Python Week 50 Python
Week 12 Python Week 25 Python Week 38 Python Week 51 Python
Week 13 Python Week 26 Python Week 39 Python Week 52 Python

🐍 Python Snippets

Reading Files

Reading csv files | Example: W05 2021

import pandas as pd

df = pd.read_csv('folder\\filename.csv')

Reading Excel files | Example: W04 2021

import pandas as pd

df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name = 'Sheet1')

Reading and aggregrating multiple Excel tabs | Example: W21 2021

import pandas as pd

# Read all Excel tabs and concat as one dateframe
all_tabs = pd.read_excel('folder\\filename.xlsx', sheet_name=None)

# Bring all the sheets together
all_dfs = []
for tab_name, df in all_tabs.items():
    df['sheet_name'] = tab_name
    all_dfs.append(df)
    combined_df = pd.concat(all_dfs, ignore_index=True)

Skipping rows and columns in an Excel tab | Example: W48 2021

import pandas as pd

df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name='Sheet1',nrows= 3,skiprows = range(1,7), usecols = "B:D")

Writing Files

Writing csv files with utf-8 encoding | Example: W10 2021

import pandas as pd

df.to_csv('folder\\filename.csv', encoding='utf-8-sig', index=False)

Writing Excel files | Example: W14 2021

import pandas as pd

with pd.ExcelWriter('folder\\filename.xlsx') as writer:  
    df_1.to_excel(writer, sheet_name='Sheet1', index=False)
    df_2.to_excel(writer, sheet_name='Sheet2', index=False)
    df_3.to_excel(writer, sheet_name='Sheet3', index=False)

DataFrame Transformations

Unioning dataframes together with concat

import pandas as pd

df_total = pd.concat([df1,df2,df3])

Replacing null values with zero, blank, previous or preceeding values

import pandas as pd

# replace nulls with zeroes
df['Column with nulls'] = df['Column with nulls'].fillna(0)

# replace nulls with empty string (blank)
df['Column with nulls'] = df['Column with nulls'].fillna('')

# replace nulls with previous non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='ffill')

# replace nulls with next non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='bfill')

Aggregrating data

Create aggregrated columns grouped by other columns

import pandas as pd

df = df.groupby(['Col1','Col2']).agg(col3_min=('Col3','min'),col3_max=('Col3','max'),col3_sum=('Col3','sum')).reset_index()

Data Clean-up

Rename single column

import pandas as pd

df.rename( columns={'Col1':'Col1_New_Name'}, inplace=True )