Data management entails raw data preparation, data validation, metadata record keeping, data combinations, scale conversion and anything that has to do with generating and preparing the data for graphing.
Exploratory data analysis (EDA) is a key input to data management (see the Graph Workflow model). EDA describes a class of statistical methods that help understand distributional form, linearise relations, transform variables to contain variation and reduce skewness, visual mining, and more.
The iterative nature of the Graph Workflow model necessitates revisiting data management at every step, and questioning the appropriateness of data form and data scope. That is to say, preliminary decoding may identify the need to transform variables, convert scales, combine variable into ratios, differences, indices, manage outliers, create data bins and aggregates, and so on. This iterative process continues until the visual becomes fully informative and bears all qualities of graphical representation.
The discussion of data management covers the following important questions (click on the links to learn more):
- Data generating process: the rules with which data has been generated.
- Data validation: testing the quality of data.
- Missing values: implications for data graphing and inferential analysis.
- Recasting scales: changing the scale of variables.
- Ratios: working with ratio constructs.
- Data reduction: reducing and graphing dense data as statistical summaries.
- Extreme values: exploration, definition and management of influential values.