class: inverse, center, hide-logo, title background-image: url(imgs/title.png) background-size: cover ## <span style='font-family:Arial; color: black; font-weight:400;'>Right Start Programming with R language</span> ### Part II: Import, manipulate and analyse .left[ <span style="font-weight: 600; font-size: 28px;">Bilikisu Aderinto</span> <span style="font-weight: 600; font-size: 28px;">Stephen Balogun</span> <span class="my_date">Sat, January 29 2022</span> ] --- ## Recap of Part I .pull-left[ - Setting up your work environment is essential for good project experience. - Your interactions with R will be largely through functions which are contained in packages - Most times, you will work with tabular data (made up of rows and columns) ] .pull-right[ <img src="./imgs/recap.jpg" width="704" /> ] --- ## Outline 1. Importing your data (web, computer, other sources) 3. Managing different file formats (*.csv*, *.xlsx*, *.txt* etc) 4. Data cleaning and transformation 5. Basic analysis 6. Data visualization: plotting charts --- ## Our data at a glance: Africa COVID-19 dataset - Country/Region: Country of reporting - Date: date of reporting - Confirmed: total confirmed cases of COVID-19 from the country - Deaths: total confirmed deaths due to COVID-19 from the country - Recovered: Cumulative number of persons who have recovered from COVID-19 from the country --- ## Organizing the project directory .pull-left[ - Put your data (inputs) in a different folder - Keep your R scripts separate - Keep your outputs in a different folders - Keep your work reproducible, __*Avoid "point-and-click" as much as possible*__ ] .pull-right[ <img src="./imgs/organizing.jpg" width="3200" /> ] --- ## Managing your R scripts .pull-left[ - Will you be needing more than a single script? - Divide your scripts into sections and label appropriately. Use **cTRL + SHIFT + R** - loading required packages - importing files - cleaning data - exploration - plotting charts - saving outputs - Keep things simple ] .pull-right[ <img src="./imgs/think.jpg" width="7680" /> ] --- ## Loading required packages .pull-left[ - **6 base** packages are loaded by default with their functions made available - Generally, any other functions will require you to access the package first - Consider all the additional packages you might need in addition to the base packages and load them ] .pull-right[ <img src="./imgs/base_pkgs.png" width="381" /> ] --- ## Importing your file .pull-left[ - Is your file on the local computer or online? - If online, do you want to import directly from the internet or download a local copy? - What type of file extension are you working with? Different file extension may require different packages - Import your file to R - Format your file appropriately (print and variable names) - Assign your file a name - Is your file tidy? What are the challenges? ] .pull-right[ <img src="./imgs/import.jpg" width="3968" /> ] --- ## Data cleaning - {dplyr}, {tidyr} and other tidyverse packages A few things identified with our data that we need to do We need to: .pull-left[ - Format column names - Format the **date** for the Excel and the text documents - Convert **sex** to factors (categorical variables) - format **state** as factors ] .pull-right[ - change **married** to a logical variable - change **height** to metres - split **BP** into **bp_systole** and **bp_diastole** - Calculate new **BMI** ] --- ## Practice - Simple exploration - Summary of your data - Using the `pipe` operator - Data wrangling --- class: middle, center, hide-logo, question background-size: contain # Plotting charts --- ## Data visualization .pull-left[ - R has several packages for visualizing charts. - Charts can be static, embedded with html widgets or interactive - {ggplot2} is one of the most common. Implemented based on the grammar of graphics - The type of chart that you plot will depend on the type of data that you have ] .pull-right[ <img src="./imgs/grammar3.png" width="1770" /> ] --- ## Common plots - Histogram: for continuous variables - Bar chart: for categorical variables - Scatter plots: for continous variables - Line plot: for trend analysis - Box plot: for visualization of summaries - Pie chart: avoid if possible --- ## Layer options .pull-left[ <img src="./imgs/grammar.png" width="1566" /> ] .pull-right[ <img src="./imgs/grammar2.png" width="2342" /> ] --- ## Themes <img src="./imgs/themes.png" width="1176" height="500" /> --- ## Practice - 2 - Using `{ggplot2}` package - Mapping variables - Selecting your type of chart - Plotting bar chart --- ## Summary - R can import several file types from both local or online platforms - You will often need to clean and transform your data before analysis - Analysis will depend on the type of conclusion of interest - There are several powerful packages for visualization. - {ggplot2}, one of the most common, uses graphical layers. --- --- class: middle, center, question background-image: url(https://upload.wikimedia.org/wikipedia/commons/3/39/Naruto_Shiki_Fujin.svg) background-size: contain # Questions