In essence, one can think of regression analysis as an advanced comparison of means. RCT analysis typically rely to regressions to test for statistical differences between the means of the control and treatment groups. A really good data cleaning process should also result in documented insights about the data and data collection to inform future data collection – either for a different round of the same project or for other future projects. The data cleaning process seeks to fulfill two goals: (1) to ensure valid analysis by cleaning individual data points that bias the analysis, and (2) to make the dataset easily usable and understandable for researchers both within and outside of the research team. See this data cleaning checklist to ensure that common cleaning actions have been completed.This article provides a very good place to start.
There is no such thing as an exhaustive list of what to do during data cleaning: each project will have individual cleaning needs.The quality of the analysis will never be better than the quality of data cleaning.The goal of data cleaning is to clean individual data points and to make the dataset easily usable and understandable for the research team and external users.2.2 Making the Dataset Usable and Understandable.