class: center, middle, inverse, title-slide # Putting it together ### Stephen Balogun ### October 12, 2021 --- ## Recap of day one .left-column[ - A bit about biostatistics - Building blocks of R objects - R functions and R packages - R projects - Assigning names in R - Arithmetic operators ] .right-column[ <img src="./imgs/recap.jpg" width="704" /> ] --- ## Today's outline 1. Organizing our project directory and scripts 2. Importing from the web 3. Importing from the computer 3. Managing **.csv**, **.xlsx**, and **.txt** files 4. Data cleaning 5. Summary statistics 6. Saving our outputs --- ## Our data at a glance .pull-left[ - ID: for patient/respondent identification - Date: date of form completion - Age: current age of respondent/patient - Sex: biological sex of respondent/patient at birth (Either male or female) - State: One of 6 states (Rivers, FCT, Lagos, Sokoto, Abia and Kaduna) ] .pull-right[ - Married: a dichotomous variable (YES/NO) - Weight: weight of respondent/patient in "Kg" - height: height of respondent/patient in "cm" - blood pressue: systolic (numerator) and diastolic (denominator) blood pressure of respondent/patient - bmi: values obtained from the field by dividing the weight (in Kg) by the square of the height (in metres) ] --- ## Organizing the project directory .pull-left[ - Put your data (inputs) in a different folder - Keep your R scripts separate - Keep your outputs in a different folders - Keep your work reproducible, __*Avoid "point-and-click" as much as possible*__ ] .pull-right[ <img src="./imgs/organizing.jpg" width="3200" /> ] --- ## Managing your R scripts .pull-left[ - Will you be needing more than a single script? - Divide your scripts into sections and label appropriately. Use **cTRL + SHIFT + R** - loading required packages - importing files - cleaning data - exploration - plotting charts - saving outputs - Keep things simple ] .pull-right[ <img src="./imgs/think.jpg" width="7680" /> ] --- ## Loading required packages .pull-left[ - **6 base** packages are loaded by default with their functions made available - Generally, any other functions will require you to access the package first - Consider all the additional packages you might need in addition to the base packages and load them - ] .pull-right[ <img src="./imgs/base_pkgs.png" width="381" /> ] --- ## Importing your file .pull-left[ - Is your file on the local computer or online? - If online, do you want to import directly from the internet or download a local copy? - What type of file extension are you working with? Different file extension may require different packages - Import your file to R - Format your file appropriately (print and variable names) - Assign your file a name - Is your file tidy? What are the challenges? ] .pull-right[ <img src="./imgs/import.jpg" width="3968" /> ] --- ## More functions .left-column[ - The `$` function - use to select a column in a dataframe - The `unique()` function - use to identify unique entries in a column - The `%>%` function - Pronounced "then" ] .right-column[ <img src="./imgs/format.jpeg" width="640" /> ] --- ## Data cleaning - {dplyr}, {tidyr} and other tidyverse packages A few things identified with our data that we need to do We need to: .pull-left[ - Format column names - Format the **date** for the Excel and the text documents - Convert **sex** to factors (categorical variables) - format **state** as factors ] .pull-right[ - change **married** to a logical variable - change **height** to metres - split **BP** into **bp_systole** and **bp_diastole** - Calculate new **BMI** ] --- ## Exploratory questions 1. How many persons from each state? 2. Plot a graph of the gender distribution 3. Plot a graph to demonstrate the relationship between weight and height 4. How many persons have systolic hypertension? 5. For each gender in each state, what are the average age, average height, and average weight? --- ## What have we learnt today? 1. Organizing our project directory and scripts 2. Importing from the web 3. Importing from the computer 3. Managing **.csv**, **.xlsx**, and **.txt** files 4. Data cleaning 5. Summary statistics 6. Saving our outputs --- class: inverse, middle, center background-image: url(https://upload.wikimedia.org/wikipedia/commons/3/39/Naruto_Shiki_Fujin.svg) background-size: contain # Questions