Home

Overview

The IPA Data Cleaning Package is a Stata package developed by Innovations for Poverty Action (IPA) to streamline the process of cleaning and validating survey data. This package includes a suite of commands designed to handle common data cleaning tasks efficiently.

Software Requirements

Some of the commands in the ipaclean program are heavily dependent on Stata’s data frames. Unfortunately, Stata data frames are only available in Stata 17 or later. Consequently, ipaclean requires that the user has Stata 17.0 or later installed on their machine prior to running ipaclean. IPA employees with older versions of Stata should contact IT for access to a newer version.

Installation

To install the package, use the following command in Stata:

* Install ipaclean using
net install ipaclean, all replace from("https://raw.githubusercontent.com/PovertyAction/ipaclean/main")

* After installation, run the following command to install helper commands:
ipaclean update

* check your version of ipaclean with the command:
ipaclean version

Features

The package includes several key commands:

ipaappend: Safely append datasets.
ipamergerepeats: Safely merge datasets.
ipaodksplit: Create dummy variables from SurveyCTO/ODK style select_multiple type questions.
ipaodkmergerepeats: Reshape and Merge ODK/SurveyCTO repeat groups.
ipacompare: Compare Datasets across multiple rounds of survey data collection.
ipacodebook: Describe data content and export codebook to excel.

ipaappend – Safely Append Datasets

The ipaappend command is designed to safely append Stata-format datasets to an existing dataset in memory with additional features to avoid common issues such as mismatched variable types. This command ensures data consistency and optionally allows for a detailed append report to assess potential type conflicts. ipaappend's safely option is an alternative to using the force option with the Stata default append command which will lead to data lose. The safely option checks for the best data type that can accommodate all values and converts the variable in the master or using datasets so the append can happen without data loss.

Use case:

The ipaappend command is especially useful in cases where datasets need to be appended but variables contain inconsistent data types across datasets eg. price variable is numeric in master dataset and string in using dataset. It ensures that appending doesn’t lead to data loss or incompatibilities that could result from variable mismatches.

Example:

Suppose you have two datasets containing car data — one with data for domestic cars, where price is recorded as a string, and one with data for foreign cars, where price is numeric. Using Stata's native append command with the force option could result in data loss. However, ipaappend with the safely option will handle the type mismatch without data loss.

* Prepare datasets
sysuse auto, clear
keep if foreign == 0
tostring price, replace
save domestic

sysuse auto, clear
keep if foreign == 1
save foreign

* Attempting to append with the native append command results in an error
append using domestic

* Using the force option will result in data loss
append using domestic, force

* Using ipaappend's safely option instead
use foreign, clear
ipaappend using domestic, outfile("append_report.xlsx") safely replace

Provide feedback

Saved searches