Loading...

< Advanced Graphs Graphical Parameters Axes and Text Combining Plots Lattice Graphs ggplot2 Graphs Probability Plots

Probability Plots This section describes creating probability plots in R for both didactic purposes and for data analyses.

Probability Plots for Teaching and Demonstration When I was a college professor teaching statistics, I used to have to draw normal distributions by hand. They always came out looking like bunny rabbits. What can I say?

Mosaic Plots

R makes it easy to draw probability distributions and demonstrate statistical concepts.

Correlograms

Some of the more common probability distributions available in R are given below.

Interactive Graphs

distribution

R name distribution

Beta

beta

Lognormal

Binomial

binom

Negative Binomial nbinom

Cauchy

cauchy Normal

Chisquare

chisq

Poisson

pois

Exponential

exp

Student t

t

F

f

Uniform

unif

material. Use promo code

Gamma

gamma Tukey

ria38 for a 38% discount.

Geometric

geom

R in Action

R in Action (2nd ed) significantly expands upon this

R name lnorm

norm

tukey

Weibull

weib

Hypergeometric hyper

Wilcoxon

wilcox

Logistic

logis

For a comprehensive list, see Statistical Distributions on the R wiki. The functions available for each distribution follow this format:

name

description

dname( ) density or probability function pname( ) cumulative density function qname( ) quantile function Rname( ) random deviates

For example, pnorm(0) =0.5 (the area under the standard normal curve to the left of zero). qnorm(0.9) = 1.28 (1.28 is the 90th percentile of the standard normal distribution). rnorm(100) generates 100 random deviates from a standard normal distribution. Each function has parameters specific to that distribution. For example, rnorm(100, m=50, sd=10) generates 100 random deviates from a normal distribution with mean 50 and standard deviation 10. You can use these functions to demonstrate various aspects of probability distributions. Two common examples are given below.

# Display the Student's t distributions with various # degrees of freedom and compare to the normal distribution x <- seq(-4, 4, length=100) hx <- dnorm(x) degf <- c(1, 3, 8, 30) colors <- c("red", "blue", "darkgreen", "gold", "black") labels <- c("df=1", "df=3", "df=8", "df=30", "normal") plot(x, hx, type="l", lty=2, xlab="x value", ylab="Density", main="Comparison of t Distributions") for (i in 1:4){ lines(x, dt(x,degf[i]), lwd=2, col=colors[i]) } legend("topright", inset=.05, title="Distributions", labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)

click to view

# Children's IQ scores are normally distributed with a # mean of 100 and a standard deviation of 15. What # proportion of children are expected to have an IQ between # 80 and 120? mean=100; sd=15 lb=80; ub=120 x <- seq(-4,4,length=100)*sd + mean hx <- dnorm(x,mean,sd) plot(x, hx, type="n", xlab="IQ Values", ylab="", main="Normal Distribution", axes=FALSE) i <- x >= lb & x <= ub lines(x, hx) polygon(c(lb,x[i],ub), c(0,hx[i],0), col="red") area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd) result <- paste("P(",lb,"< IQ <",ub,") =", signif(area, digits=3)) mtext(result,3) axis(1, at=seq(40, 160, 20), pos=0)

click to view For a comprehensive view of probability plotting in R, see Vincent Zonekynd's Probability Distributions.

Fitting Distributions There are several methods of fitting distributions in R. Here are some options. You can use the qqnorm( ) function to create a Quantile-Quantile plot evaluating the fit of sample data to the normal distribution. More generally, the qqplot( ) function creates a Quantile-Quantile plot for any theoretical distribution.

# Q-Q plots par(mfrow=c(1,2)) # create sample data x <- rt(100, df=3) # normal fit qqnorm(x); qqline(x) # t(3Df) fit qqplot(rt(1000,df=3), x, main="t(3) Q-Q Plot", ylab="Sample Quantiles") abline(0,1)

click to view The fitdistr( ) function in the MASS package provides maximum-likelihood fitting of univariate distributions. The format is fitdistr(x, densityfunction) where x is the sample data and densityfunction is one of the following: "beta", "cauchy", "chi-squared", "exponential", "f", "gamma", "geometric", "log-normal", "lognormal", "logistic", "negative binomial", "normal", "Poisson", "t" or "weibull".

# Estimate parameters assuming log-Normal distribution # create some sample data x <- rlnorm(100) # estimate paramters library(MASS) fitdistr(x, "lognormal")

Finally R has a wide range of goodness of fit tests for evaluating if it is reasonable to assume that a random sample comes from a specified theoretical distribution. These include chi-square, Kolmogorov-Smirnov, and Anderson-Darling. For more details on fitting distributions, see Vito Ricci's Fitting Distributions with R. For general (non R) advice, see Bill Huber's Fitting Distributions to Data.

To Practice Try this interactive course on exploratory data analysis. Copyright © 2017 Robert I. Kabacoff, Ph.D. | Sitemap