November 18th, 2007
The Stats Module
Have you ever had an array of numbers and needed to know if one was a statistical outlier? Ever needed the standard deviation of those numbers? Well, I was recently working on an app, where I did. I packaged up the functionality as the following Module.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
|
module Stats
def mean_stdev_n(numbers)
sum = 0
sumsq = 0
n = 0
numbers.each do |num|
if num
sum += num
sumsq += (num*num)
n = n+1
end
end
if n==0
return nil,nil,0
else
mean = sum/n
stddev = Math::sqrt(sumsq/n - mean * mean)
return mean, stddev, n
end
end
def outlier?(value, mean, stddev, n)
grubbs = grubbs_outlier_test(value, mean, stddev)
z = critical_z(n)
return grubbs > z
end
def grubbs_outlier_test(value, mean, stddev)
((mean - value)/stddev).abs
end
def critical_z(n)
case(n)
when 3
1.15
when 4
1.48
when 5
1.71
when 6
1.89
when 7
2.02
when 8
2.13
when 9
2.21
when 10
2.29
when 11
2.34
when 12
2.41
when 13
2.46
when 14
2.51
when 15
2.55
when 16
2.59
when 17
2.62
when 18
2.65
when 19
2.68
when 20
2.71
when 21
2.73
when 22
2.76
when 23
2.78
when 24
2.80
when 25
2.82
when 26
2.84
when 27
2.86
when 28
2.88
when 29
2.89
when 30
2.91
when 31
2.92
when 32
2.94
when 33
2.95
when 34
2.97
when 35
2.98
when 36
2.99
when 37
3.00
when 38
3.01
when 39
3.03
when 40..49
3.04
when 50..59
3.13
when 60..69
3.20
when 70..79
3.26
when 80..89
3.31
when 90..99
3.35
when 100..109
3.38
when 110..119
3.42
when 120..129
3.44
when 130..139
3.47
else
3.49
end
end
end |
If you are unfamiliar with the grubbs test... well you're not alone, neither was I. I found
this page to be especially helpful.
Usage
To use the module, simply include it, calculate mean, stddev and n, then use the outlier? funtion.
1
2
3
4
5
6
7
8
|
include(Stats)
numbers = # some array of numbers
mean, stddev, n = mean_stdev_n(numbers)
if outlier?(5, mean, stddev, n)
puts "5 is an outlier"
else
puts "5 is not an outlier"
end |
November 28th, 2007 at 11:17 AM [...] Stats on Rails [...]