[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090317164051.GA32245@elte.hu>
Date: Tue, 17 Mar 2009 17:40:51 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Jesper Krogh <jesper@...gh.cc>,
john stultz <johnstul@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Len Brown <len.brown@...el.com>
Subject: Re: Linux 2.6.29-rc6
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Tue, 17 Mar 2009, Ingo Molnar wrote:
> >
> > That's the idea of my patch: to use not two endpoints but thousands
> > of measurement points.
>
> Umm. Except you don't.
>
> > By measuring more we can get a more precise result, and we also do
> > not assume anything about how much time passes between two
> > measurement points.
>
> That's fine, but your actual code doesn't _do_ that.
>
> > My 'delta' algorithm does not assume anything about how much time
> > passes between two measurement points - it calculates the slope and
> > keeps a rolling average of that slope.
>
> No, you keep a very bad measure of "some kind of random average of the
> last few points", which - if I read things right:
>
> - lacks precision (you really need to use 'double' floating point to do
> it well, otherwise the rounding errors will kill you). You seem to be
> aiming for a 10-bit fixed point thing, which may or may not work if
> done cleverly, but:
>
> - seems to be based on a rather weak averaging function which certainly
> will lose data over time.
>
> The thing is, the only _accurate_ average is the one done over
> long time distances. It's very true that your slope thing works
> very well over such long times, and you'd get accurate measurement
> if you did it that way, BUT THAT IS NOT WHAT YOU DO. You have a
> very tight loop, so you get very bad slopes, and then you use a
> weak averaging function to try to make them better, but it never
> does.
Hm, the intention there was to have a memory of ~1000 entries via a
decaying average of 1:1000.
In parallel to that there's also a noise estimator (which too decays
over time). So basically when observed noise is very low we
essentially use the data from the last ~1000 measurements. (well,
not exactly - as the 'memory' of more recent data will be stronger
than that of older ones.)
Again ... it's a clearly non-working patch so it's not really a
defendable concept :-)
> Also, there seems to be a fundamental bug in your PIT reading
> routine. My fast-TSC calibration only looks at the MSB of the PIT
> read for a very good reason: if you don't use the explicit LATCH
> command, you may be getting the MSB of one counter value, and then
> the LSB of another. So your PIT read can easily be off by ~256 PIT
> cycles. Only by caring only for the MSB can you do an unlatched
> read!
>
> That is why pit_expect_msb() looks for the "edge" where the MSB
> changes, and never actually looks at the LSB.
>
> This issue may be an additional reason for your problems, although
> maybe your noise correction will be able to avoid those cases.
indeed. I did check the trace results though via gnuplot yesterday
(suspectig PIT readout outliers) and there were no outliers.
For any final patch it's still a showstopper issue.
But the source of error and miscalibration is elsewhere.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists