[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <IA1PR11MB61710CDB2B6B47118832770E89B79@IA1PR11MB6171.namprd11.prod.outlook.com>
Date: Tue, 7 Mar 2023 07:49:49 +0000
From: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "Joel Fernandes (Google)" <joel@...lfernandes.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Frederic Weisbecker" <frederic@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"urezki@...il.com" <urezki@...il.com>
Subject: RE: [PATCH v3] rcu: Add a minimum time for marking boot as completed
> From: Paul E. McKenney <paulmck@...nel.org>
> [...]
> >
> > Thank you so much Paul for the detailed comments on the measured data.
> >
> > I'm curious how did you figure out the number 24 that we at *least* need.
> > This can guide me on whether the number of samples is enough for
> > future testing ;-).
>
> It is a rough rule of thumb. For more details and accuracy, study up on the
> Student's t-test and related statistical tests.
>
> Of course, this all assumes that the data fits a normal distribution.
Thanks for this extra information. Good to know the Student's t-test.
> > I did another 48 measurements (2x of 24) for each case (w/o and w/
> > Joel's v2 patch) as below.
> > All the testing configurations for the new testing are the same as
> > before.
> >
> > a) Measured 48 times w/o v2 patch (in seconds):
> > 8.4, 8.8, 9.2, 9.0, 8.3, 9.6, 8.8, 9.4,
> > 8.7, 9.2, 8.3, 9.4, 8.4, 9.6, 8.5, 8.8,
> > 8.8, 8.9, 9.3, 9.2, 8.6, 9.7, 9.2, 8.8,
> > 8.7, 9.0, 9.1, 9.5, 8.6, 8.9, 9.1, 8.6,
> > 8.2, 9.1, 8.8, 9.2, 9.1, 8.9, 8.4, 9.0,
> > 9.8, 9.8, 8.7, 8.8, 9.1, 9.5, 9.5, 8.7
> > The average OS boot time was: ~9.0s
>
> The range is 8.2 through 9.8.
>
> > b) Measure 48 times w/ v2 patch (in seconds):
> > 7.7, 8.6, 8.1, 7.8, 8.2, 8.2, 8.8, 8.2,
> > 9.8, 8.0, 9.2, 8.8, 9.2, 8.5, 8.4, 9.2,
> > 8.5, 8.3, 8.1, 8.3, 8.6, 7.9, 8.3, 8.3,
> > 8.6, 8.9, 8.0, 8.5, 8.4, 8.6, 8.7, 8.0,
> > 8.8, 8.8, 9.1, 7.9, 9.7, 7.9, 8.2, 7.8,
> > 8.1, 8.5, 8.6, 8.4, 9.2, 8.6, 9.6, 8.3,
> > The average OS boot time was: ~8.5s
>
> The range is 7.7 through 9.8.
>
> There is again significant overlap, so it is again unclear that you have a
> statistically significant difference. So could you please calculate the standard
> deviations?
a's standard deviation is ~0.4.
b's standard deviation is ~0.5.
a's average 9.0 is at the upbound of the standard deviation of b's [8.0, 9].
So, the measurements should be statistically significant to some degree.
The calculated standard deviations are via:
https://www.gigacalculator.com/calculators/standard-deviation-calculator.php
> > @Joel Fernandes (Google), you may replace my old data with the above
> > new data in your commit message.
> >
> > > But we can apply the binomial distribution instead of the usual
> > > normal distribution. First, let's sort and take the medians:
> > >
> > > a: 8.2 8.3 8.4 8.6 8.7 8.7 8.8 8.8 9.0 9.3 Median: 8.7
> > > b: 7.6 7.8 8.2 8.2 8.2 8.2 8.4 8.5 8.7 9.3 Median: 8.2
> > >
> > > 8/10 of a's data points are greater than 0.1 more than b's median
> > > and 8/10 of b's data points are less than 0.1 less than a's median.
> > > What are the odds that this happens by random chance?
> > >
> > > This is given by sum_0^2 (0.5^10 * binomial(10,i)), which is about 0.055.
> >
> > What's the meaning of 0.5 here? Was it the probability (we assume?)
> > that each time b's data point failed (or didn't satisfy) "less than
> > 0.1 less than a's median"?
>
> The meaning of 0.5 is the probability of a given data point being on one side
> or the other of the corresponding distribution's median. This of course
> assumes that the median of the measured data matches that of the
> corresponding distribution, though the fact that the median is also a mode of
> both of the old data sets gives some hope.
Thanks for the detailed comments on the meaning of 0.5 here. :-)
> The meaning of the 0.1 is the smallest difference that the data could measure.
> I could have instead chosen 0.0 and asked if there was likely some (perhaps
> tiny) difference, but instead, I chose to ask if there was likely some small but
> meaningful difference. It is better to choose the desired difference before
> measuring the data.
Thanks for the detailed comments on the meaning of 0.1 here. :-)
> Why don't you try applying this approach to the new data? You will need the
> general binomial formula.
Thank you Paul for the suggestion.
I just tried it, but not sure whether my analysis was correct ...
Analysis 1:
a's median is 8.9.
35/48 b's data points are less than 0.1 less than a's median.
For a's binomial distribution P(X >= 35) = 0.1%, where p=0.5.
So, we have strong confidence that b is 100ms faster than a.
Analysis 2:
a's median - 0.4 = 8.9 - 0.4 = 8.5.
24/48 b's data points are less than 0.4 less than a's median.
The probability that a's data points are less than 8.5 is p = 7/48 = 0.1458
For a's binomial distribution P(X >= 24) = 0.0%, where p=0.1458.
So, looks like we have confidence that b is 400ms faster than a.
The calculated cumulative binomial distributions P(X) is via:
https://www.gigacalculator.com/calculators/binomial-probability-calculator.php
I apologize if this analysis/discussion bored some of you. ;-)
-Qiuxu
> [...]
Powered by blists - more mailing lists