How to Tell if a Variable is Continuous or Discrete in R
- Obtain he
birthwt
dataset from theMASS
library - Description of the variables
- Glimpse the dataset
- Some variables should be categorical or factor in nature
- PLOT VARIABLES FOR A CONTINUOUS OUTCOME
- Drop a variable (i.e., low)
- Univariate Analysis: Analyze each continuous variable (i.e., bwt, age, lwt)
- BWT
- Histogram, normal density curve, kernel density lines and rug lines
- Q-Q normality plot
- Boxplot
- AGE: Histogram, normal density curve, kernel density lines, rug lines, q-q normal plot, boxplot
- LWT: Histogram, normal density curve, kernel density lines, rug lines, q-q normal plot, boxplot
- BWT
- Univariate Analysis: Analyze each categorical variable/factor (i.e., race, smoke, ptl, ht, ui, ftv )
- Unordered bar plots for categorical variables or factors
- Ordered bar plots for categorical variables or factors
- Ordered horizontal bar plots for categorical variables or factors (e.g., race and ftv)
- Bivariate analysis
- For two continuous variables
- Bivariate scatter plots
- Matrix of scatter plots (using "car" package)
- Matrix of scatter plots (using "pairs" function)
- For one continuous variable in relation to another categorical variable
- Boxplots using boxplot function and plot function
- Bar plots
- For two continuous variables
- PLOT VARIABLES FOR A CATEGORICAL OUTCOME
- Drop a variable (i.e., bwt)
- Barplots
- Barplots for two categorical variables
- Bar plots for three categorical variables (see few examples)
- low by (race and smoke)
- low by (race and ptl)
- low by (race and ht)
library(MASS) # Support Functions and Datasets for Venables and Ripley's MASS library(psych) # Procedures for Psychological, Psychometric, and Personality library(car) # Companion to Applied Regression library(magrittr) # A Forward-Pipe Operator for R library(RColorBrewer)
Obtain he birthwt
dataset from the MASS
library
data(birthwt) str(birthwt)
## 'data.frame': 189 obs. of 10 variables: ## $ low : int 0 0 0 0 0 0 0 0 0 0 ... ## $ age : int 19 33 20 21 18 21 22 17 29 26 ... ## $ lwt : int 182 155 105 108 107 124 118 103 123 113 ... ## $ race : int 2 3 1 1 1 3 1 3 1 1 ... ## $ smoke: int 0 0 1 1 1 0 0 0 1 1 ... ## $ ptl : int 0 0 0 0 0 0 0 0 0 0 ... ## $ ht : int 0 0 0 0 0 0 0 0 0 0 ... ## $ ui : int 1 0 0 1 1 0 0 0 0 0 ... ## $ ftv : int 0 3 1 2 0 0 1 1 1 0 ... ## $ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
Description of the variables
# | variable name | variable label | coded levels |
---|---|---|---|
1 | low | indicator of birth weight less than 2.5 kg | 0, 1 |
2 | age | mother's age in years | continuous variable |
3 | lwt | mother's weight in pounds at last menstrual period | continuous variable |
4 | race | mother's race (1 = white, 2 = black, 3 = other) | 1, 2, 3 |
5 | smoke | smoking status during pregnancy | 0, 1 |
6 | ptl | number of previous premature labours | 0, 1, 2, 3 |
7 | ht | history of hypertension | 0, 1 |
8 | ui | presence of uterine irritability | 0, 1 |
9 | ftv | number of physician visits during the first trimester | 0, 1, 2, 3, 4, 6 |
10 | bwt | birth weight in grams | continuous variable |
Glimpse the dataset
head(birthwt)
low | age | lwt | race | smoke | ptl | ht | ui | ftv | bwt | |
---|---|---|---|---|---|---|---|---|---|---|
85 | 0 | 19 | 182 | 2 | 0 | 0 | 0 | 1 | 0 | 2523 |
86 | 0 | 33 | 155 | 3 | 0 | 0 | 0 | 0 | 3 | 2551 |
87 | 0 | 20 | 105 | 1 | 1 | 0 | 0 | 0 | 1 | 2557 |
88 | 0 | 21 | 108 | 1 | 1 | 0 | 0 | 1 | 2 | 2594 |
89 | 0 | 18 | 107 | 1 | 1 | 0 | 0 | 1 | 0 | 2600 |
91 | 0 | 21 | 124 | 3 | 0 | 0 | 0 | 0 | 0 | 2622 |
Some variables should be categorical or factor in nature
Use factor
to recode
ibwt <- birthwt ibwt$low <- factor(ibwt$low) ibwt$race <- factor(ibwt$race) ibwt$smoke <- factor(ibwt$smoke) ibwt$ptl <- factor(ibwt$ptl) ibwt$ht <- factor(ibwt$ht) ibwt$ui <- factor(ibwt$ui) ibwt$ftv <- factor(ibwt$ftv) str(ibwt)
## 'data.frame': 189 obs. of 10 variables: ## $ low : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... ## $ age : int 19 33 20 21 18 21 22 17 29 26 ... ## $ lwt : int 182 155 105 108 107 124 118 103 123 113 ... ## $ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ... ## $ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ... ## $ ptl : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ... ## $ ht : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... ## $ ui : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ... ## $ ftv : Factor w/ 6 levels "0","1","2","3",..: 1 4 2 3 1 1 2 2 2 1 ... ## $ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
PLOT VARIABLES FOR A CONTINUOUS OUTCOME
bwt
could be dependent on all other variables except low
Drop a variable (i.e., low)
ibwtc <- ibwt[-1] head(ibwtc)
age | lwt | race | smoke | ptl | ht | ui | ftv | bwt | |
---|---|---|---|---|---|---|---|---|---|
85 | 19 | 182 | 2 | 0 | 0 | 0 | 1 | 0 | 2523 |
86 | 33 | 155 | 3 | 0 | 0 | 0 | 0 | 3 | 2551 |
87 | 20 | 105 | 1 | 1 | 0 | 0 | 0 | 1 | 2557 |
88 | 21 | 108 | 1 | 1 | 0 | 0 | 1 | 2 | 2594 |
89 | 18 | 107 | 1 | 1 | 0 | 0 | 1 | 0 | 2600 |
91 | 21 | 124 | 3 | 0 | 0 | 0 | 0 | 0 | 2622 |
Univariate Analysis: Analyze each continuous variable (i.e., bwt, age, lwt)
BWT
Histogram, normal density curve, kernel density lines and rug lines
par(mfrow=c(1,1)) par(oma = c(1, 1, 1, 1)) # Sets outside margins: b, l, t, r par(mar = c(4, 1, 4, 1)) # Sets plot margins: b, l, t, r # Histogram hist(ibwtc$bwt, prob = TRUE, # else freq = FALSE ylim = c(0, 0.0006), #xlim = c(30, 100), breaks = 12, col = "lightgray", #E5E5E5 border = 0, xlab="bwt", main = "Infant's birth weight from the \"birthweight\" data set ") # Normal density curve curve(dnorm(x, mean = mean(ibwtc$bwt), sd = sd(ibwtc$bwt)), col = "darkred", lwd = 2, add = TRUE) # Histogram: Kernel density lines lines(density(ibwtc$bwt), col = "blue", lwd = 2) # Histogram: Rug rug(ibwtc$bwt, col = "red", lwd = 2)
Q-Q normality plot
par(mfrow=c(1,1)) par(oma = c(1, 1, 1, 1)) par(mar = c(4, 1, 4, 1)) qqnorm(ibwtc$bwt, xlab = "bwt", main="Normal Q-Q plot of infant's birth weight") qqline(ibwtc$bwt)
Boxplot
par(mfrow=c(1,1)) par(oma = c(1, 1, 1, 1)) par(mar = c(4, 1, 4, 1)) boxplot(ibwtc$bwt, horizontal = T, xlab = "bwt", main="Boxplot of infant's birth weight")
AGE: Histogram, normal density curve, kernel density lines, rug lines, q-q normal plot, boxplot
par(mfrow=c(1,3)) par(oma = c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) hist(ibwtc$age, prob = TRUE, ylim = c(0, 0.1), #xlim = c(30, 100), breaks = 12, col = "lightgray", border = 0, xlab="age", main = "Mother's age from\n the \"birthweight\" data set ") curve(dnorm(x, mean = mean(ibwtc$age), sd = sd(ibwtc$age)), col = "darkred", lwd = 2, add = TRUE) lines(density(ibwtc$age), col = "blue", lwd = 2) rug(ibwtc$age, col = "red", lwd = 2) qqnorm(ibwtc$age, xlab = "age", main="Normal Q-Q plot of\n mother's age") qqline(ibwtc$age) boxplot(ibwtc$age, horizontal = T, xlab = "age", main="Boxplot of mother's age")
LWT: Histogram, normal density curve, kernel density lines, rug lines, q-q normal plot, boxplot
par(mfrow=c(1,3)) par(oma = c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) hist(ibwtc$lwt, prob = TRUE, # else freq = FALSE ylim = c(0, 0.02), #xlim = c(30, 100), breaks = 12, col = "lightgray", #E5E5E5 border = 0, xlab="lwt", main = "Mother's weight in pounds\n at last menstrual period") curve(dnorm(x, mean = mean(ibwtc$lwt), sd = sd(ibwtc$lwt)), col = "darkred", lwd = 2, add = TRUE) lines(density(ibwtc$lwt), col = "blue", lwd = 2) rug(ibwtc$lwt, col = "red", lwd = 2) qqnorm(ibwtc$lwt, xlab = "lwt", main="Normal Q-Q plot of mother's weight\n in pounds at last menstrual period") qqline(ibwtc$lwt) boxplot(ibwtc$age, horizontal = T, xlab = "lwt", main="Boxplot of mother's weight in\n pounds at last menstrual period")
Univariate Analysis: Analyze each categorical variable/factor (i.e., race, smoke, ptl, ht, ui, ftv )
Unordered bar plots for categorical variables or factors
par(mfrow=c(2,3)) par(oma = c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) barplot(table(ibwtc$race), col= c("bisque1", "bisque2"), xlab="race", ylab="count", main="Number of participants\n by race", ylim=c(0, 170)) barplot(table(ibwtc$smoke), col= c("bisque1", "bisque2"), xlab="smoke", main="Number of participants\n by smoking status", ylim=c(0, 170)) barplot(table(ibwtc$ptl), col= c("bisque1", "bisque2"), xlab="ptl", main="Number of participants by\n premature delivery status", ylim=c(0, 170)) barplot(table(ibwtc$ht), col= c("bisque1", "bisque2"), xlab="ht", main=" Number of participants\n by hypertension status", ylim=c(0, 200)) barplot(table(ibwtc$ui), xlab="ui", col= c("bisque1", "bisque2"), main="Number of participants by\n uterine infection status", ylim=c(0, 200)) barplot(table(ibwtc$ftv), col= c("bisque1", "bisque2", "bisque3"), xlab="ftv", main="Number of participants by 1st\n trimester physician visit status", ylim=c(0, 200))
Ordered bar plots for categorical variables or factors
par(mfrow=c(2,3)) par(oma = c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) barplot(table(ibwtc$race)[order(table(ibwtc$race))], col= c("bisque1", "bisque2"), xlab="race", ylab="count", main="Number of participants\n by race", ylim=c(0, 170)) barplot(table(ibwtc$smoke)[order(table(ibwtc$smoke))], col= c("bisque1", "bisque2"), xlab="smoke", main="Number of participants\n by smoking status", ylim=c(0, 170)) barplot(table(ibwtc$ptl)[order(table(ibwtc$ptl))], col= c("bisque1", "bisque2"), xlab="ptl", main="Number of participants by\n premature delivery status", ylim=c(0, 170)) barplot(table(ibwtc$ht)[order(table(ibwtc$ht))], col= c("bisque1", "bisque2"), xlab="ht", main=" Number of participants\n by hypertension status", ylim=c(0, 200)) barplot(table(ibwtc$ui)[order(table(ibwtc$ui))], xlab="ui", col= c("bisque1", "bisque2"), main="Number of participants by\n uterine infection status", ylim=c(0, 200)) barplot(table(ibwtc$ftv)[order(table(ibwtc$ftv))], col= c("bisque1", "bisque2", "bisque3"), xlab="ftv", main="Number of participants by 1st\n trimester physician visit status", ylim=c(0, 200))
Ordered horizontal bar plots for categorical variables or factors (e.g., race and ftv)
par(mfrow=c(3,2)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) barplot(table(ibwtc$race)[order(table(ibwtc$race))], horiz = TRUE, col= c("bisque1", "bisque2", "bisque3"), xlab="count", ylab="race", main="Frequency of mother's race") barplot(table(ibwtc$smoke)[order(table(ibwtc$smoke))], horiz = TRUE, col= c("bisque1", "bisque2", "bisque3"), xlab="count", ylab="ftv", main="Frequency of smoking status") barplot(table(ibwtc$ptl)[order(table(ibwtc$ptl))], horiz = T, col= c("bisque1", "bisque2"), xlab="count", ylab="ptl", main="Frequency of participants by\n premature delivery status") barplot(table(ibwtc$ht)[order(table(ibwtc$ht))], horiz = TRUE, col= c("bisque1", "bisque2", "bisque3"), xlab="count", ylab="ht", main="Number of participants\n by hypertension status") barplot(table(ibwtc$ui)[order(table(ibwtc$ui))], horiz=T, las=1, # las gives orientation of axis labels xlab="count", ylab="ui", col= c("bisque1", "bisque2"), main="Frequency by\n uterine infection status") barplot(table(ibwtc$ftv)[order(table(ibwtc$ftv))], horiz = TRUE, col= c("bisque1", "bisque2", "bisque3"), xlab="count", ylab="ftv", main="Frequency of 1st trimester\n physician visit status")
Bivariate analysis
For two continuous variables
Bivariate scatter plots
par(mfrow=c(1, 2)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) # age vs bwt plot(ibwtc$age, ibwtc$bwt, col="grey", pch=16, main="Infant's birth weight\n by mother's age", xlab="Mother's age (years)", ylab="Birth weight (grams)") abline(lm(ibwtc$bwt ~ ibwtc$age), col="darkred", lwd=2) lines(lowess (ibwtc$age, ibwtc$bwt), col = "blue", lwd = 2) # lwt vs bwt plot(ibwtc$lwt, ibwtc$bwt, col="grey", pch=16, xlab="Mother's weight (pounds)\n at last menstrual period", ylab="Birth weight (grams)", main="Infant's birth weight\n by mother's weight in pounds\n at last menstrual period") abline(lm(ibwtc$bwt ~ ibwtc$lwt), col="darkred", lwd=2) lines(lowess (ibwtc$lwt, ibwtc$bwt), col = "blue", lwd = 2)
Matrix of scatter plots (using "car" package)
scatterplotMatrix(~ bwt + lwt + age, data = ibwtc, col = "lightgray", main="Scatterplot Matrix for \"birthwt\" Data")
Matrix of scatter plots (using "pairs" function)
panel.hist <- function(x, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks; nB <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nB], 0, breaks[-1], y, ...) } pairs(~ bwt + age + lwt, data=ibwtc, panel = panel.smooth, # Optional smoother main = "Scatterplot Matrix \"birthwt\" Data Using \"pairs\" Function", diag.panel = panel.hist, pch = 16, col = "lightgray")
For one continuous variable in relation to another categorical variable
Boxplots using boxplot function and plot function
par(mfrow=c(2,3)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) boxplot(bwt ~ race, data=ibwtc, col = brewer.pal(4, "Set2"), boxwex = .5, # Width of box whisklty = 1, # Whisker line type; 1 = solid line outpch = 16, # Outlier symbol; 16 = filled circle outcol = brewer.pal(4, "Set2"), # Outlier color xlab = "Race", ylab = "Infant's body weight", main="Birth weight by\n mother's race") boxplot(ibwtc$bwt ~ ibwtc$smoke, col = brewer.pal(4, "Set2"), boxwex = .5, whisklty = 1, outpch = 16, outcol = brewer.pal(4, "Set2"), xlab = "Smoking status", ylab = "Infant body weight", main="Birth weight by mother's\n smoking status") boxplot(ibwtc$bwt ~ ibwtc$ht, col = brewer.pal(4, "Set2"), boxwex = 0.5, whisklty = 1, outpch = 16, outcol = brewer.pal(4, "Set2"), xlab = "Hypertension status", ylab = "Infant body weight", main="Birth weight by mother's\n hypertension status") boxplot(ibwtc$bwt ~ ibwtc$ui, boxwex = 0.5, outpch = 16, xlab = "Uterine infection status", ylab = "Infant body weight", main="Birth weight by mother's\n uterine infection status") boxplot(ibwtc$bwt ~ ibwtc$ptl, boxwex = 0.5, outpch = 16, xlab = "Previous premature labor status", ylab = "Infant body weight", main="Birth weight by mother's\n previous premature labor status") boxplot(ibwtc$bwt ~ ibwtc$ftv, boxwex = 0.5, outpch = 16, xlab = "Physician visit status during 1st trimester", ylab = "Infant body weight", main="Birth weight by mother's\n physician visit\n status during 1st trimester")
# Boxplots using plot function par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) plot(bwt ~ race, data=ibwtc, col = brewer.pal(4, "Set2"), boxwex = .5, whisklty = 1, outpch = 16, outcol = brewer.pal(4, "Set2"), xlab = "Race", ylab = "Infant's body weight", main="Birth weight by mother's race")
Bar plots
Using subset function to determine group means
par(mfrow=c(1,3)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) # ex1 using square brackets to subset and calculate mean mbwt.race1 <- mean(ibwtc[ibwtc$race==1, ]$bwt) mbwt.race2 <- mean(ibwtc[ibwtc$race==2, ]$bwt) mbwt.race3 <- mean(ibwtc[ibwtc$race==3, ]$bwt) barplot(c(mbwt.race1, mbwt.race2, mbwt.race3), legend.text = c("1", "2", "3"), col = c(1, 2, 3), main = "Mean birth weight by race", xlab = "Race", ylab = "Birth weight") # ex 2 using subset function mbwt.by.race <- c(mean(subset(ibwtc$bwt, ibwtc$race==1)), mean(subset(ibwtc$bwt, ibwtc$race==2)), mean(subset(ibwtc$bwt, ibwtc$race==3))) barplot(mbwt.by.race, legend.text = c("1", "2", "3"), col = c(1, 2, 3), main = "Mean birth weight by race", xlab = "Race", ylab = "Birth weight") # ex 3 using forward pipes with subset function mbwt_by_race <- c(ibwtc$bwt %>% subset(ibwtc$race==1) %>% mean(), ibwtc$bwt %>% subset(ibwtc$race==2) %>% mean(), ibwtc$bwt %>% subset(ibwtc$race==3) %>% mean()) barplot(mbwt_by_race, legend.text = c("1", "2", "3"), col = c(1, 2, 3), main = "Mean birth weight by race", xlab = "Race", ylab = "Birth weight")
Using aggregate function to determine group means
# Aggregate bwt.by.race.means <- aggregate(bwt ~ race, data=ibwt, FUN = mean) bwt.by.smoke.means <- aggregate(bwt ~ smoke, data=ibwtc, FUN = mean) bwt.by.ht.means <- aggregate(bwt ~ ht, data=ibwtc, FUN = mean) bwt.by.ui.means <- aggregate(bwt ~ ui, data=ibwtc, FUN = mean) bwt.by.ptl.means <- aggregate(bwt ~ ptl, data=ibwtc, FUN = mean) bwt.by.ftv.means <- aggregate(bwt ~ ftv, data=ibwtc, FUN = mean) bwt.by.race.means
race | bwt |
---|---|
1 | 3102.719 |
2 | 2719.692 |
3 | 2805.284 |
# Transpose and assign variable/column names bwt.by.race.means.t <- t(bwt.by.race.means[-1]) colnames(bwt.by.race.means.t) <- bwt.by.race.means[, 1] bwt.by.smoke.means.t <- t(bwt.by.smoke.means[-1]) colnames(bwt.by.smoke.means.t) <- bwt.by.smoke.means[, 1] bwt.by.ht.means.t <- t(bwt.by.ht.means[-1]) colnames(bwt.by.ht.means.t) <- bwt.by.ht.means[, 1] bwt.by.ui.means.t <- t(bwt.by.ui.means[-1]) colnames(bwt.by.ui.means.t) <- bwt.by.ui.means[, 1] bwt.by.ptl.means.t <- t(bwt.by.ptl.means[-1]) colnames(bwt.by.ptl.means.t) <- bwt.by.ptl.means[, 1] bwt.by.ftv.means.t <- t(bwt.by.ftv.means[-1]) colnames(bwt.by.ftv.means.t) <- bwt.by.ftv.means[, 1] bwt.by.race.means.t
## 1 2 3 ## bwt 3102.719 2719.692 2805.284
# Bar plots par(mfrow=c(1,6)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) barplot(bwt.by.race.means.t, col = "lightgray", main = "bwt by race", xlab = "race", ylab = "mean bwt") barplot(bwt.by.smoke.means.t, col = "lightgray", main = "bwt by smoke", xlab = "smoke", ylab = "mean bwt") barplot(bwt.by.ptl.means.t, col = "lightgray", main = "bwt by ptl", xlab = "ptl", ylab = "mean bwt") barplot(bwt.by.ht.means.t, col = "lightgray", main = "bwt by ht", xlab = "ht", ylab = "mean bwt") barplot(bwt.by.ui.means.t, col = "lightgray", main = "bwt by ui", xlab = "ui", ylab = "mean bwt") barplot(bwt.by.ftv.means.t, col = "lightgray", main = "bwt by ftv", xlab = "ftv", ylab = "mean bwt")
PLOT VARIABLES FOR A CATEGORICAL OUTCOME
We will go back to the original birthwt
dataset and subset it for a categorical outcome (i.e., low
. Hence we will drop bwt
.
low
could be dependent on all other variables (i.e., "age", "lwt", "race", "smoke", "ptl", "ht", "ui" and "ftv").
Univariate plotting is done before. The relationship of the categorical outcome (i.e., low) wrt continuous variables could be explored using boxplots and barplots as shown above.
Here we will focus the relationship between categorical outcome and other categorical variables.
Drop a variable (i.e., bwt)
ibwtd <- ibwt[-10] head(ibwtd)
low | age | lwt | race | smoke | ptl | ht | ui | ftv | |
---|---|---|---|---|---|---|---|---|---|
85 | 0 | 19 | 182 | 2 | 0 | 0 | 0 | 1 | 0 |
86 | 0 | 33 | 155 | 3 | 0 | 0 | 0 | 0 | 3 |
87 | 0 | 20 | 105 | 1 | 1 | 0 | 0 | 0 | 1 |
88 | 0 | 21 | 108 | 1 | 1 | 0 | 0 | 1 | 2 |
89 | 0 | 18 | 107 | 1 | 1 | 0 | 0 | 1 | 0 |
91 | 0 | 21 | 124 | 3 | 0 | 0 | 0 | 0 | 0 |
colnames(ibwtd)
## [1] "low" "age" "lwt" "race" "smoke" "ptl" "ht" "ui" "ftv"
Barplots
Barplots for two categorical variables
par(mfrow=c(2,3)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) plot(ibwtd$race, ibwtd$low, xlab="race", ylab="low", main="low by race") plot(ibwtd$smoke, ibwtd$low, xlab="smoke", ylab="low", main="low by smoke") plot(ibwtd$ptl, ibwtd$low, xlab="ptl", ylab="low", main="low by ptl") plot(ibwtd$ht, ibwtd$low, xlab="ht", ylab="low", main="low by ht") plot(ibwtd$ui, ibwtd$low, xlab="ui", ylab="low", main="low by ui") plot(ibwtd$ftv, ibwtd$low, xlab="ftv", ylab="low", main="low by ftv")
Bar plots for three categorical variables (see few examples)
low by (race and smoke)
# 3 levels of race (1, 2, 3), 2 levels of smoke (0, 1) and 2 levels of low(0, 1) mosaicplot(~ race + smoke + low, data = ibwtd, ylab="smoke", main="low by race and smoke", shade=TRUE)
low by (race and ptl)
mosaicplot(~ race + ptl + low, data = ibwtd, ylab="ptl", main="low by race and ptl", shade=TRUE)
low by (race and ht)
mosaicplot(~ race + ht + low, data = ibwtd, ylab="ht", main="low by race and ht", shade = TRUE)
### low by (race and ui) and (race and ftv)
par(mfrow=c(1, 2)) par(oma=c(1, 1, 1, 1)) par(mar=c(4, 1, 4, 1)) mosaicplot(~ race + ui + low, data = ibwtd, ylab="ui", main="low by race and ui", shade = TRUE) mosaicplot(~ race + ftv + low, data = ibwtd, ylab= "ftv", main="low by race and ftv", shade= TRUE)
Source: https://rstudio-pubs-static.s3.amazonaws.com/538706_ba31d218adcd4216b761f13ae3d9c901.html
0 Response to "How to Tell if a Variable is Continuous or Discrete in R"
Enregistrer un commentaire