Solving Preppin' Data Challenges with Python, and now Alteryx and Tableau Prep.
Twitter 💬 | LinkedIn 👔 | GitHub | Website 🔗
At the start of 2021 I wanted to improve my data prep skills with Python. Now in 2022 I'm doing the same for Alteryx and Tableau Prep.
Preppin' Data is a weekly data prep challenge built around Tableau Prep but the challenges apply to other tools and even coding languages, so ideal practice if you're looking to improve your data prep skills.
From participating, I'm much more confident in Python and use Python in projects such as working out the life expectancy of chess pieces and finding the resale value of Pokemon trading cards.
Below are my solutions and Python code snippets I regularly use in these challenges.
Challenge | Python | R | SQL |
---|---|---|---|
Week 01 | 🐍 | 🏴☠️ | ❄️ |
Week 02 | 🐍 | 🏴☠️ | ❄️ |
Week 03 | 🐍 | 🏴☠️ | ❄️ |
Week 04 | 🐍 | 🏴☠️ | ❄️ |
Week 05 | ❄️ | ||
Week 06 | ❄️ | ||
Week 07 | ❄️ | ||
Week 08 | ❄️ | ||
Week 09 | ❄️ | ||
Week 10 | ❄️ | ||
Week 11 | ❄️ | ||
Week 12 | ❄️ |
Challenge | Python | Alteryx | Tableau Prep |
---|---|---|---|
Week 01 | 🐍 | ⬆️ | ✨ |
Week 02 | 🐍 | ⬆️ | ✨ |
Week 03 | 🐍 | ⬆️ | ✨ |
Week 04 | 🐍 | ⬆️ | ✨ |
Challenge | Solution | Challenge | Solution | Challenge | Solution | Challenge | Solution |
---|---|---|---|---|---|---|---|
Week 01 | Python | Week 14 | Python | Week 27 | Python | Week 40 | Python |
Week 02 | Python | Week 15 | Python | Week 28 | Python | Week 41 | Python |
Week 03 | Python | Week 16 | Python | Week 29 | Python | Week 42 | Python |
Week 04 | Python | Week 17 | Python | Week 30 | Python | Week 43 | Python |
Week 05 | Python | Week 18 | Python | Week 31 | Python | Week 44 | Python |
Week 06 | Python | Week 19 | Python | Week 32 | Python | Week 45 | Python |
Week 07 | Python | Week 20 | Python | Week 33 | Python | Week 46 | Python |
Week 08 | Python | Week 21 | Python | Week 34 | Python | Week 47 | Python |
Week 09 | Python | Week 22 | Python | Week 35 | Python | Week 48 | Python |
Week 10 | Python | Week 23 | Python | Week 36 | Python | Week 49 | Python |
Week 11 | Python | Week 24 | Python | Week 37 | Python | Week 50 | Python |
Week 12 | Python | Week 25 | Python | Week 38 | Python | Week 51 | Python |
Week 13 | Python | Week 26 | Python | Week 39 | Python | Week 52 | Python |
Reading csv files | Example: W05 2021
import pandas as pd
df = pd.read_csv('folder\\filename.csv')
Reading Excel files | Example: W04 2021
import pandas as pd
df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name = 'Sheet1')
Reading and aggregrating multiple Excel tabs | Example: W21 2021
import pandas as pd
# Read all Excel tabs and concat as one dateframe
all_tabs = pd.read_excel('folder\\filename.xlsx', sheet_name=None)
# Bring all the sheets together
all_dfs = []
for tab_name, df in all_tabs.items():
df['sheet_name'] = tab_name
all_dfs.append(df)
combined_df = pd.concat(all_dfs, ignore_index=True)
Skipping rows and columns in an Excel tab | Example: W48 2021
import pandas as pd
df = pd.read_excel('folder\\filename.xlsx', engine='openpyxl', sheet_name='Sheet1',nrows= 3,skiprows = range(1,7), usecols = "B:D")
Writing csv files with utf-8 encoding | Example: W10 2021
import pandas as pd
df.to_csv('folder\\filename.csv', encoding='utf-8-sig', index=False)
Writing Excel files | Example: W14 2021
import pandas as pd
with pd.ExcelWriter('folder\\filename.xlsx') as writer:
df_1.to_excel(writer, sheet_name='Sheet1', index=False)
df_2.to_excel(writer, sheet_name='Sheet2', index=False)
df_3.to_excel(writer, sheet_name='Sheet3', index=False)
Unioning dataframes together with concat
import pandas as pd
df_total = pd.concat([df1,df2,df3])
Replacing null values with zero, blank, previous or preceeding values
import pandas as pd
# replace nulls with zeroes
df['Column with nulls'] = df['Column with nulls'].fillna(0)
# replace nulls with empty string (blank)
df['Column with nulls'] = df['Column with nulls'].fillna('')
# replace nulls with previous non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='ffill')
# replace nulls with next non-null value
df['Column with nulls'] = df['Column with nulls'].fillna(method='bfill')
Create aggregrated columns grouped by other columns
import pandas as pd
df = df.groupby(['Col1','Col2']).agg(col3_min=('Col3','min'),col3_max=('Col3','max'),col3_sum=('Col3','sum')).reset_index()
Rename single column
import pandas as pd
df.rename( columns={'Col1':'Col1_New_Name'}, inplace=True )