A quick experiment in R can unveil the impact of sample size on the estimates we make from data. A small number of samples provides us less information about the process or system from which we’re collecting data, while a large number can help ground our findings in near certainty. See the earlier post on sample size, confidence intervals and related topics on R Explorations.

Using the “animation” package once again, I’ve put together a simple animation to describe this.

#package containing saveGIF function
library(animation)
#setting GIF options
ani.options(interval = 0.12, ani.width = 480, ani.height = 320)
#a function to help us call GIF plots easily
plo <- function(samplesize, iter = 100){
for (i in seq(1,iter)){
#Generating a sample from the normal distribution
x <- rnorm(samplesize,mu,sd)
#Histogram of samples as they're generated
hist(x, main = paste("N = ",samplesize,", xbar = ",round(mean(x), digits = 2),
", s = ",round(sd(x),digits = 2)), xlim = c(5,15),
ylim = c(0,floor(samplesize/3)), breaks = seq(4,16,0.5), col = rgb(0.1,0.9,0.1,0.2),
border = "grey", xlab = "x (Gaussian sample)")
#Adding the estimate of the mean line to the histogram
abline(v = mean(x), col = "red", lw = 2 )
}
}
#Setting the parameters for the distribution
mu = 10.0
sd = 1.0
for (i in c(10,50,100,500,1000,10000)){
saveGIF({plo(i,mu,sd)},movie.name = paste("N=",i,", mu=",mu,", sd=",sd,".gif"))
}

## Animated Results

Very small sample size of 5. Observe how the sample mean line hunts wildly.

A small sample size of 10. Mean once again moves around quite a bit.

Moderate sample size of 50. Far less inconsistency in estimate (red line)

A larger sample size, showing little deviation in sample mean over different samples

A large sample size, indicating minor variations in sample mean

Very large sample size (however, still smaller than many real world data sets!). Sample mean estimate barely changes over samples.

### Like this:

Like Loading...