Introduction to data visualization
ggplot2
- The components of a graph
- ggplot objects
- Geometries
- Aesthetic mappings
- Layers
- Global versus local aesthetic mappings
- Scales
- Labels and titles
- Categories as colors
- Annotation, shapes, and adjustments
- Add-on packages
- Putting it all together
- Quick plots with qplot
- Grids of plots
- Exercises
Visualizing data distributions
- Variable types
- Case study: describing student heights
- Distribution function
- Cumulative distribution functions
- Histograms
- Smoothed density
- Interpreting the y-axis
- Densities permit stratification
- Exercises
- The normal distribution
- Standard units
- Quantile-quantile plots
- Percentiles
- Boxplots
- Stratification
- Case study: describing student heights (continued)
- Exercises
- ggplot2 geometries
- Barplots
- Histograms
- Density plots
- Boxplots
- QQ-plots
- Images
- Quick plots
- Exercises
Data visualization in practice
- Case study: new insights on poverty
- Scatterplots
- Faceting
- facet_wrap
- Fixed scales for better comparisons
- Time series plots
- Labels instead of legends
- Data transformations
- Log transformation
- Which base?
- Transform the values or the scale?
- Visualizing multimodal distributions
- Comparing multiple distributions with boxplots and ridge plots
- Boxplots
- Ridge plots
- Example: 1970 versus 2010 income distributions
- Accessing computed variables
- Weighted densities
- The ecological fallacy and importance of showing the data
Data visualization principles
- Encoding data using visual cues
- Know when to include
- Do not distort quantities
- Order categories by a meaningful value
- Show the data
- Ease comparisons
- Use common axes
- Align plots vertically to see horizontal changes and horizontally to see
- vertical changes
- Consider transformations
- Visual cues to be compared should be adjacent
- Use color
- Think of the color blind
- Plots for two variables
- Slope charts
- Bland-Altman plot
- Encoding a third variable
- Avoid pseudo-three-dimensional plots
- Avoid too many significant digits
- Know your audience
- Exercises
- Case study: vaccines and infectious diseases
- Exercises
Robust summaries
- Outliers
- Median
- The inter quartile range (IQR)
- Tukey’s definition of an outlier
- Median absolute deviation
- Exercises
- Case study: self-reported student heights