Course Overview on R, Into the Tidyverse (Data Wrangling and ETL)

To work accurately and effectively with your data, you need clean, easily accessible data. This means setting up an ETL (extract, transform, load) pipeline and performing data wrangling. Tidyverse helps you to do just that. Tidyverse is a collection of open source R programming language packages that make your data science work a lot easier.

What You Will Learn

In this course you'll learn:

How to tidy data
Import data
Wrangle data
Gather and reshape data
Work with tables
Scrape the web
Process strings
Parse data
Mine tweets

Delivery Methods

Delivery Method	Duration
Classroom Instructor-led classroom-based training. Classes are scheduled at various conference centres in the Sandton area or your premises. Stationary and printed manuals or online resources are included. Refreshments, including 2 tea breaks and a cooked meal for lunch are provided. Training is between 9 am to 4 pm. Classroom	4 Days	Get a Quote
Live Virtual Training This course is delivered live via Microsoft Teams. You will be able to see and hear the instructor, view their whiteboard, and ask questions or communicate in the chat. Attend from the comfort of your own home or private office. Live Virtual Training	4 Days	Get a Quote

Discounts Available

Save up to 10% by booking and paying 10 business days before the course.

Brochure:

Information may change without notice.

Audience

Data scientists
Programmers
Business analysts
Engineers
Scientists

Pre-Requisites

Leading Training's Python, Introduction to R or equivalent knowledge

Course Outline / Curriculum

The Tidyverse

Tidy data
Exercises
Manipulating data frames
- Adding a column with mutate
- Subsetting with filter
- Selecting columns with select
Exercises
The pipe: %>%
Exercises
Summarising data
- summarise
- pull
- Group then summarise with group_by
Sorting data frames
- Nested sorting
- The top n
Exercises
Tibbles
- Tibbles display better
- Subsets of tibbles are tibbles
- Tibbles can have complex entries
- Tibbles can be grouped
- Create a tibble using tibble instead of data.frame
- The dot operator
do
The purrr package
Tidyverse conditionals
- case_when
- between
Exercises

Importing data

Paths and the working directory
- The filesystem
- Relative and full paths
- The working directory
- Generating path names
- Copying files using paths
The readr and readxl packages
- readr
- readxl
Exercises
Downloading files
R-base importing functions
- scan
Text versus binary files
Unicode versus ASCII
Organising data with spreadsheets
Exercises

Introduction to data wrangling

Reshaping data

gather
spread
separate
unite
Exercises

Joining tables

Joins
- Left join
- Right join
- Inner join
- Full join
- Semi join
- Anti join
Binding
- Binding columns
- Binding by rows
Set operators
- Intersect
- Union
- setdiff
- setequal
Exercises

Web scraping

HTML
The rvest package
CSS selectors
JSON
Exercises

String processing

The stringr package
Case study 1: US murders data
Case study 2: self-reported heights
How to escape when defining strings
Regular expressions
- Strings are a regexp
- Special characters
- Character classes
- Anchors
- Quantifiers
- White space \s
- Quantifiers: *, ?, +
- Not
- Groups
Search and replace with regex
- Search and replace using groups
Testing and improving
Trimming
Changing lettercase
Case study 2: self-reported heights (continued)
- The extract function
- Putting it all together
  - String splitting
Case study 3: extracting tables from a PDF
Recoding
Exercises

Parsing dates and times

The date data type
The lubridate package
Exercises

Text mining

Case study: Trump tweets
Text as data
Sentiment analysis
Exercises

Schedule Dates and Booking

There are currently no scheduled dates.

Add me to the waiting list

Submit Enquiry

Name

Email

Telephone

Query