home about

Stats on Rails

November 18th, 2007 cpetersen

The Stats Module

Have you ever had an array of numbers and needed to know if one was a statistical outlier? Ever needed the standard deviation of those numbers? Well, I was recently working on an app, where I did. I packaged up the functionality as the following Module.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
module Stats
  def mean_stdev_n(numbers)
    sum = 0
    sumsq = 0
    n = 0
    numbers.each do |num|
      if num
        sum += num
        sumsq += (num*num)
        n = n+1
      end
    end

    if n==0
      return nil,nil,0
    else
      mean = sum/n
      stddev = Math::sqrt(sumsq/n - mean * mean)
      return mean, stddev, n
    end
  end
 
  def outlier?(value, mean, stddev, n)
    grubbs = grubbs_outlier_test(value, mean, stddev)
    z = critical_z(n)
    return grubbs > z
  end
 
  def grubbs_outlier_test(value, mean, stddev)
    ((mean - value)/stddev).abs
  end
 
  def critical_z(n)
    case(n)
    when 3
      1.15
    when 4
      1.48
    when 5
      1.71
    when 6
      1.89
    when 7
      2.02
    when 8
      2.13
    when 9
      2.21
    when 10
      2.29
    when 11
      2.34
    when 12
      2.41
    when 13
      2.46
    when 14
      2.51
    when 15
      2.55
    when 16
      2.59
    when 17
      2.62
    when 18
      2.65
    when 19
      2.68
    when 20
      2.71
    when 21
      2.73
    when 22
      2.76
    when 23
      2.78
    when 24
      2.80
    when 25
      2.82
    when 26
      2.84
    when 27
      2.86
    when 28
      2.88
    when 29
      2.89
    when 30
      2.91
    when 31
      2.92
    when 32
      2.94
    when 33
      2.95
    when 34
      2.97
    when 35
      2.98
    when 36
      2.99
    when 37
      3.00
    when 38
      3.01
    when 39
      3.03
    when 40..49
      3.04
    when 50..59
      3.13
    when 60..69
      3.20
    when 70..79
      3.26
    when 80..89
      3.31
    when 90..99
      3.35
    when 100..109
      3.38
    when 110..119
      3.42
    when 120..129
      3.44
    when 130..139
      3.47
    else
      3.49
    end
  end
end
If you are unfamiliar with the grubbs test... well you're not alone, neither was I. I found this page to be especially helpful.

Usage

To use the module, simply include it, calculate mean, stddev and n, then use the outlier? funtion.
1
2
3
4
5
6
7
8
include(Stats)
numbers = # some array of numbers
mean, stddev, n = mean_stdev_n(numbers)
if outlier?(5, mean, stddev, n)
  puts "5 is an outlier"
else
  puts "5 is not an outlier"
end

1 Response to “Stats on Rails”

  1. » The Links » roarin’ reporter Says:
    [...] Stats on Rails [...]

Leave a Reply