R, Into the Tidyverse (Data Wrangling and ETL)

TRAINING COURSE

Details

To work accurately and effectively with your data, you need clean, easily accessible data. This means setting up an ETL (extract, transform, load) pipeline and performing data wrangling. Tidyverse helps you to do just that. Tidyverse is a collection of open source R programming language packages that make your data science work a lot easier.

In this course you'll learn:

  • How to tidy data
  • Import data
  • Wrangle data
  • Gather and reshape data
  • Work with tables
  • Scrape the web
  • Process strings
  • Parse data
  • Mine tweets

Delivery Methods

Delivery Method Duration
Classroom
4 Days Get a Quote
Live Virtual Training
4 Days Get a Quote

Discounts Available

Save up to 10% by booking and paying 10 business days before the course.

Brochure:

Download Brochure

Information may change without notice.

Audience

  • Data scientists
  • Programmers
  • Business analysts 
  • Engineers
  • Scientists

Pre-Requisites

Leading Training's Python, Introduction to R or equivalent knowledge

Course Outline / Curriculum

The Tidyverse

  • Tidy data
  • Exercises
  • Manipulating data frames
    • Adding a column with mutate
    • Subsetting with filter
    • Selecting columns with select
  • Exercises
  • The pipe: %>%
  • Exercises
  • Summarising data
    • summarise
    • pull
    • Group then summarise with group_by
  • Sorting data frames
    • Nested sorting
    • The top n
  • Exercises
  • Tibbles
    • Tibbles display better
    • Subsets of tibbles are tibbles
    • Tibbles can have complex entries
    • Tibbles can be grouped
    • Create a tibble using tibble instead of data.frame
    • The dot operator
  •  do
  •  The purrr package
  •  Tidyverse conditionals
    •  case_when
    •  between
  •  Exercises

Importing data

  • Paths and the working directory
    • The filesystem
    • Relative and full paths
    • The working directory
    • Generating path names
    • Copying files using paths
  • The readr and readxl packages
    • readr
    • readxl
  •  Exercises
  •  Downloading files
  •  R-base importing functions
    • scan
  •  Text versus binary files
  •  Unicode versus ASCII
  •  Organising data with spreadsheets
  •  Exercises

Introduction to data wrangling 

Reshaping data 

  • gather
  • spread
  • separate
  • unite
  • Exercises

Joining tables 

  •  Joins
    • Left join
    • Right join
    • Inner join
    • Full join
    • Semi join
    • Anti join
  •  Binding
    • Binding columns
    • Binding by rows
  •  Set operators
    • Intersect
    • Union
    • setdiff
    • setequal
  •  Exercises

Web scraping 

  •  HTML
  •  The rvest package
  •  CSS selectors
  •  JSON
  •  Exercises

String processing 

  •  The stringr package
  •  Case study 1: US murders data
  •  Case study 2: self-reported heights
  •  How to escape when defining strings
  •  Regular expressions
    • Strings are a regexp
    • Special characters
    • Character classes
    • Anchors
    • Quantifiers
    • White space \s
    • Quantifiers: *, ?, +
    • Not
    • Groups
  • Search and replace with regex
    • Search and replace using groups
  •  Testing and improving
  •  Trimming
  •  Changing lettercase
  • Case study 2: self-reported heights (continued)
    • The extract function
    • Putting it all together
      • String splitting
  • Case study 3: extracting tables from a PDF
  • Recoding
  • Exercises

Parsing dates and times 

  •  The date data type
  •  The lubridate package
  •  Exercises

Text mining 

  •  Case study: Trump tweets
  •  Text as data
  •  Sentiment analysis
  •  Exercises

Schedule Dates and Booking

There are currently no scheduled dates.

Add me to the waiting list

Submit Enquiry

Name
Email
Telephone
Query