class: center, middle, inverse, title-slide # Introduction to R and RStudio ## Data analysis and visualization in R
UC Merced ### Andrea Sánchez-Tapia
Rio de Janeiro Botanical Garden - ¡liibre! - RLadies+ ### 2021-03-23 --- # outline + Overview to R and RStudio + Introduction to R + Starting with Data + Exploratory data analysis and basic statistics with R + Manipulating Data Frames with __dplyr__ + Data visualisation with `ggplot2` --- class: inverse, middle, center # overview of R and RStudio --- ## why learn `R`? + the language of choice for academic statisticians -- + **_libre_ software**: free and free-to-be-used-and-modified for any means -> one of the pillars of open science -- + __script-based__: reproducibility, easier to scale up your analyses, transparency (track errors), great way to learn about methods. -- + __interdisciplinary and modular__: lots of code written by area specialists. At the core of its philosophy is a smooth transition from user to programmer. -- + __communication__ with other tools: manuscripts, presentations, apps and dashboards --- ## why learn `R`? + communication with __other programming languages__ (ex. __reticulate__ to run python scripts) -- + great __graphic capabilities__! -- + __official support__: help in documentation, mailing lists -- + __an active and welcoming community__: email lists, Stack Overflow, [RStudio community](community.rstudio.com/), useR groups, .purple[R-Ladies+] chapters, Slack communities, <svg viewBox="0 0 512 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> `#rstats` <center> <img src="https://raw.githubusercontent.com/rladies/starter-kit/master/stickers/rainbow-inclusive.png" width="150" /> --- ## `R` has a modular structure: __packages__ + `R` __base__ installation includes base packages developed and maintained by the __`R` Core Development Team__ -- + other packages are created by the community and hosted in __CRAN__ (The Comprehensive `R` Archive Network) or Bioconductor, GitHub, rOpenSci.org -- + to install packages from CRAN: `install.packages("tidyverse")` --- background-image:url("figs/00_rstudio.png") background-size: 50% # Running R in RStudio <svg viewBox="0 0 640 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> --- class: inverse, middle, center # Setup and project organization --- ## working directory + you have to tell R where you will be working, so that it understands where to read tables, where to save outputs etc: __working directory__ -- + `getwd()` in the console -- + the default is __"home"__: check general options and the "File" tab -- + you can tell R and RStudio where you want to work with `setwd()` -- + even better: instead of opening RStudio open an __R script__ or a __RStudio project__ (just as you would in MS Word <svg viewBox="0 0 384 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M369.9 97.9L286 14C277 5 264.8-.1 252.1-.1H48C21.5 0 0 21.5 0 48v416c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48V131.9c0-12.7-5.1-25-14.1-34zM332.1 128H256V51.9l76.1 76.1zM48 464V48h160v104c0 13.3 10.7 24 24 24h104v288H48zm220.1-208c-5.7 0-10.6 4-11.7 9.5-20.6 97.7-20.4 95.4-21 103.5-.2-1.2-.4-2.6-.7-4.3-.8-5.1.3.2-23.6-99.5-1.3-5.4-6.1-9.2-11.7-9.2h-13.3c-5.5 0-10.3 3.8-11.7 9.1-24.4 99-24 96.2-24.8 103.7-.1-1.1-.2-2.5-.5-4.2-.7-5.2-14.1-73.3-19.1-99-1.1-5.6-6-9.7-11.8-9.7h-16.8c-7.8 0-13.5 7.3-11.7 14.8 8 32.6 26.7 109.5 33.2 136 1.3 5.4 6.1 9.1 11.7 9.1h25.2c5.5 0 10.3-3.7 11.6-9.1l17.9-71.4c1.5-6.2 2.5-12 3-17.3l2.9 17.3c.1.4 12.6 50.5 17.9 71.4 1.3 5.3 6.1 9.1 11.6 9.1h24.7c5.5 0 10.3-3.7 11.6-9.1 20.8-81.9 30.2-119 34.5-136 1.9-7.6-3.8-14.9-11.6-14.9h-15.8z"></path></svg>) --- ## project organization and best practices + projects are better organized if we use __one folder per project__ and __subfolders__ within our working directory -- + we shouldn't modify __raw data files__ but __save processed data__ (and the corresponding scripts) -- + instead of __absolute paths__ we should use __relative paths__: + `.` "here" + `./figs` a subfolder called `figs` + the upper level `..` -- + avoid `C:users/your_name/your_file_structure/your_working_directory` --- ## In this and the following sessions ``` project/ * ├── data/ * │ ├── raw * │ └── processed ├── docs/ * ├── figs/ ├── R/ * ├── output/ * └── README.md ``` + unzip the .zip file into a folder of your preference <right> <svg viewBox="0 0 640 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> </right> --- ## RStudio projects RStudio projects create a __.Rproj__ file in your folder that acts as a shortcut for your projects -- <img src="https://media.giphy.com/media/3oriO04qxVReM5rJEA/giphy.gif" width="300" style="display: block; margin: auto;" /> --- class:middle, inverse, center # introduction to R --- ## introduction to R + `<-` is the assignment operation in R and it does not return output it __creates objects__ that are saved in the workspace (`Alt + -`) -- + overwriting objects __does not affect other objects__ -- + __naming object tips__: + don't begin with a number or symbol. + there are forbidden words + be consistent with your __coding style__! + avoid dots + __name functions as verbs and objects as nouns__ -- + you can see which objects are saved in the workspace by using `ls()` --- ## about the workspace + R creates __objects__ that occupy RAM memory: the __workspace__ -- + the __workspace__ can be saved and loaded between sessions BUT -- + __you can lose track of how you created the objects in the workspace__ -- + `#goodpractices` don't save the workspace --- background-image:url("figs/0a_setup.png") background-size: 60% background-position: 100% 100% .pull-left[in the general options] --- class: inverse, middle, center # functions, arguments and understanding the help --- ## functions and arguments ``` weight_kg <- sqrt(10) round(3.14159) args(round) ``` --- ## if you know the name of the function ``` help(function) ?function args(function) ``` + select the name of the function and click __F1__ Check the structure of the help file: + Description + Usage + Arguments + Details --- ## if you don't know the name of the function ``` ??kruskal ``` (or search it - google it - duckduckgo it) --- ## the structure of a function help `args(function)` + The arguments of a function are coded: + in order + with or without default settings -- You can either + use the arguments in order, without naming them + use the first arguments without naming them and then some optional arguments, with name --- ## data types in R <small> ```r animals <- c("mouse", "rat", "dog") weight_g <- c(50, 60, 65, 82) ``` -- ```r class(animals) ``` ``` ## [1] "character" ``` -- ```r class(weight_g) ``` ``` ## [1] "numeric" ``` </small> -- `character` and `numeric` but also `logical` and `integer` ("whole" numbers, with no decimal component, in N), `complex`, and others. --- class: center, middle <svg viewBox="0 0 640 512" style="fill:currentColor;position:relative;display:inline-block;top:.1em;height:3em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> --- ## subsetting vectors + R is __1-indexed__ and intervals are closed (not half-open) ```r animals <- c("mouse", "rat", "dog", "cat") animals[2] ``` ``` ## [1] "rat" ``` + Subsetting is done with brackets `[]` ```r animals[c(3, 2)] ``` ``` ## [1] "dog" "rat" ``` --- ## conditional subsetting ```r weight_g <- c(21, 34, 39, 54, 55) weight_g[c(TRUE, FALSE, FALSE, TRUE, TRUE)] ``` ``` ## [1] 21 54 55 ``` Nobody works like this, instead we use __logical clauses__ to __generate__ these logical vectors --- ## logical clauses + equality or not: `==`, `!=` + inequalities: `<`. `>`, `<=`, `>=` + union (OR) `|` + intersection (AND) `&` + belonging `%in%` + differences between sets: `setdiff()` + negation works `!`: "not in" `!a %in% b` --- ## comparing vectors <small> ```r animals <- c("mouse", "rat", "dog", "cat") more_animals <- c("rat", "cat", "dog", "duck", "goat") animals %in% more_animals ``` ``` ## [1] FALSE TRUE TRUE TRUE ``` --- ##comparing vectors <small> ```r animals <- c("mouse", "rat", "dog", "cat") more_animals <- c("rat", "cat", "dog", "duck", "goat") animals == more_animals ``` ``` ## Warning in animals == more_animals: longer object length is not a multiple of ## shorter object length ``` ``` ## [1] FALSE FALSE TRUE FALSE FALSE ``` + Vectors are compared __one by one AND recycled__ when one of them is shorter, so use `%in%` when you want to check __belonging to a set__ --- ## missing data <small> ```r heights <- c(2, 4, 4, NA, 6) mean(heights) ``` ``` ## [1] NA ``` ```r max(heights) ``` ``` ## [1] NA ``` ```r mean(heights, na.rm = TRUE) ``` ``` ## [1] 4 ``` ```r max(heights, na.rm = TRUE) ``` ``` ## [1] 6 ``` --- ## data structures + __vector__: lineal arrays (one dimension: only length) -- + __factors__: vectors (one-dimensional) representing __categorical variables__ and thus having __levels__ -- + __matrices__: arrays of vectors -> the same type (all numeric or all character, for instance) (two dimensions: width and length) -- + __data frames__: two-dimensional arrays but might be of combined types (i.e., column 1 with names, column 2 with numbers) -- + __arrays__ are similar to matrices and dataframes but may be three-dimensional ("layered" data frames) -- + __list__: literally a list of anything (a list of data frames, or different objects) --- class: inverse, middle, center # Getting help in R --- --- ## Other sources of help + Taskviews [https://cran.r-project.org/web/views/](https://cran.r-project.org/web/views/) --- class: center, middle # ¡Thanks! <center> <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"></path></svg> [andreasancheztapia@gmail.com](mailto:andreasancheztapia@gmail.com) <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [@SanchezTapiaA](https://twitter.com/SanchezTapiaA) <svg viewBox="0 0 496 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg><svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M105.2 24.9c-3.1-8.9-15.7-8.9-18.9 0L29.8 199.7h132c-.1 0-56.6-174.8-56.6-174.8zM.9 287.7c-2.6 8 .3 16.9 7.1 22l247.9 184-226.2-294zm160.8-88l94.3 294 94.3-294zm349.4 88l-28.8-88-226.3 294 247.9-184c6.9-5.1 9.7-14 7.2-22zM425.7 24.9c-3.1-8.9-15.7-8.9-18.9 0l-56.6 174.8h132z"></path></svg> <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M22.2 32A16 16 0 0 0 6 47.8a26.35 26.35 0 0 0 .2 2.8l67.9 412.1a21.77 21.77 0 0 0 21.3 18.2h325.7a16 16 0 0 0 16-13.4L505 50.7a16 16 0 0 0-13.2-18.3 24.58 24.58 0 0 0-2.8-.2L22.2 32zm285.9 297.8h-104l-28.1-147h157.3l-25.2 147z"></path></svg> [andreasancheztapia](http://github.com/andreasancheztapia)