data analysis

Summer camp: R Day3

Data analysis Warm up! We can use “describe” in psych package to see the number of participant, mean, std of a variable. library(psych) describe(penguins$body_mass_g) result vars n mean sd median trimmed mad min max range X1 1 342 4201.75 801.95 4050 4154.01 889.56 2700 6300 3600 skew kurtosis se X1 0.47 -0.74 43.36 What is TIDY DATA Every column is a variable Every row is an observation Every cell has one value It will benefit a lot if we deal with tidy data, for example, easy for data sharing, reproducible, easy to automate… Data cleaning Remove data hierachically!...

Summer camp: R Day2

Create dataframe Create variables ## i. name names <- c("Ada","Robert","Mia") ## ii. age ages <- c(20,21,22) ## iii. Factor, so that you can add levels that not exist in the data year <- c("Freshman","Sophomore","Junior") year <- factor(year, levels=c("Freshman","Sophomore", "Junior","Senior")) Create dataframe students <- data.frame(names,ages,year) Query a dataframe students$names Set working dictory get working dicrectory getwd() set working directory setwd() or go to “session” and set working directory or build R file in the directory you want to work in

Summer camp: R Day1

Data type in the datasheet words: “California” categorical data: a / b / c logical: TRUE, FALSE number: 10 missing data: NA Hot key run one chunk in the scripts: ctrl + Enter run all chunks in the scripts: ctrl + Enter + Shift Data type in R vector vector <- c("Ada","Emily","Jack") if you combine different data types into one vector, you will get vector consist of string vector <- c(TRUE,"Ada",10) factor...