Skip to main content
Get a Quote
Course Enquiry
Contact Us
R, Into the Tidyverse (Data Wrangling and ETL)

RELATED COURSES


TRAINERS FOR COURSE


Details

To work accurately and effectively with your data, you need clean, easily accessible data. This means setting up an ETL (extract, transform, load) pipeline and performing data wrangling. Tidyverse helps you to do just that. Tidyverse is a collection of open source R programming language packages that make your data science work a lot easier.

In this course you'll learn:

  • How to tidy data
  • Import data
  • Wrangle data
  • Gather and reshape data
  • Work with tables
  • Scrape the web
  • Process strings
  • Parse data
  • Mine tweets

Delivery Methods

Leading Training is focusing on providing virtual training courses for the foreseeable future and will only consider in-person and classroom training on request, with a required minimum group size of four delegates. We remain committed to offering training that is fast, focused and effective.

Delivery Method Duration Price (excl. VAT)
Fulltime 4 Days R 9,996.00
Webinar 4 Days R 8,200.00

Discounts Available

Save up to 10% by booking and paying 10 business days before the course.

Brochure:

Download Brochure


Information may change without notice.

Audience

  • Data scientists
  • Programmers
  • Business analysts 
  • Engineers
  • Scientists

Pre-Requisites

Leading Training's Python, Introduction to R or equivalent knowledge

Course Outline / Curriculum

The Tidyverse

  • Tidy data
  • Exercises
  • Manipulating data frames
    • Adding a column with mutate
    • Subsetting with filter
    • Selecting columns with select
  • Exercises
  • The pipe: %>%
  • Exercises
  • Summarising data
    • summarise
    • pull
    • Group then summarise with group_by
  • Sorting data frames
    • Nested sorting
    • The top n
  • Exercises
  • Tibbles
    • Tibbles display better
    • Subsets of tibbles are tibbles
    • Tibbles can have complex entries
    • Tibbles can be grouped
    • Create a tibble using tibble instead of data.frame
    • The dot operator
  •  do
  •  The purrr package
  •  Tidyverse conditionals
    •  case_when
    •  between
  •  Exercises

Importing data

  • Paths and the working directory
    • The filesystem
    • Relative and full paths
    • The working directory
    • Generating path names
    • Copying files using paths
  • The readr and readxl packages
    • readr
    • readxl
  •  Exercises
  •  Downloading files
  •  R-base importing functions
    • scan
  •  Text versus binary files
  •  Unicode versus ASCII
  •  Organising data with spreadsheets
  •  Exercises

Introduction to data wrangling 

Reshaping data 

  • gather
  • spread
  • separate
  • unite
  • Exercises

Joining tables 

  •  Joins
    • Left join
    • Right join
    • Inner join
    • Full join
    • Semi join
    • Anti join
  •  Binding
    • Binding columns
    • Binding by rows
  •  Set operators
    • Intersect
    • Union
    • setdiff
    • setequal
  •  Exercises

Web scraping 

  •  HTML
  •  The rvest package
  •  CSS selectors
  •  JSON
  •  Exercises

String processing 

  •  The stringr package
  •  Case study 1: US murders data
  •  Case study 2: self-reported heights
  •  How to escape when defining strings
  •  Regular expressions
    • Strings are a regexp
    • Special characters
    • Character classes
    • Anchors
    • Quantifiers
    • White space \s
    • Quantifiers: *, ?, +
    • Not
    • Groups
  • Search and replace with regex
    • Search and replace using groups
  •  Testing and improving
  •  Trimming
  •  Changing lettercase
  • Case study 2: self-reported heights (continued)
    • The extract function
    • Putting it all together
      • String splitting
  • Case study 3: extracting tables from a PDF
  • Recoding
  • Exercises

Parsing dates and times 

  •  The date data type
  •  The lubridate package
  •  Exercises

Text mining 

  •  Case study: Trump tweets
  •  Text as data
  •  Sentiment analysis
  •  Exercises

Schedule Dates and Booking

There are currently no scheduled dates.

Please note that this course needs a minimum of 6 delegates to schedule a course. You can choose to be added to the waiting list by clicking the button below and we will contact you when we have enough delegates interested. Should we not get enough delegates, we will refund or credit your paid booking.

Add me to the waiting list

Should you need this course urgently, the following options are available:

  1. Pay for 6 delegates (whether you have them or not) and we will schedule the course as soon as possible.
  2. If you have fewer delegates and cannot pay for 6, we can negotiate a shortened course where some of the time will be spent in blended learning - watching videos and doing tutorials and exercises with some contact time with the trainer. We would want to discuss what your core needs are so that we cover those aspects. You need to have paid for 3 delegates at least.