[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070711231912.GA32263@elte.hu>
Date: Thu, 12 Jul 2007 01:19:12 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andi Kleen <andi@...stfloor.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Arjan van de Ven <arjan@...radead.org>,
Chris Wright <chrisw@...s-sol.org>
Subject: Re: x86 status was Re: -mm merge plans for 2.6.23
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> That was *exactly* the same thing you talked about when I refused to
> take the original timer changes into 2.6.20. You were talking about
> how lots of people had worked really hard, and how it was really
> tested.
yes - i was (way too!) upset about it, and your reasoning for the
rejection was hard (on us) but fair: you wanted a quiet 2.6.20, and you
felt fundamentally uneasy about the patches.
> And it damn well was NOT really tested, and 2.6.21 ended up being a
> horribly painful experience (one of the more painful kernel releases
> in recent times), and we ended up havign to fix a *lot* of stuff.
yes. We had 12 -hrt/dynticks merge related regressions between
2.6.21-rc1 and -final, and 4 after final. Here's a quick post-mortem:
12 fixes after -rc1:
[PATCH] i386: Fix bogus return value in hpet_next_event()
[PATCH] clockevents: remove bad designed sysfs support for now
[PATCH] clocksource: Fix thinko in watchdog selection
[PATCH] dynticks: fix hrtimer rounding error in next_timer_interrupt
[PATCH] i386: add command line option "local_apic_timer_c2_ok"
[PATCH] i386: disable local apic timer via command line or dmi quirk
[PATCH] i386: clockevents fix breakage on Geode/Cyrix PIT
[PATCH] i386: trust the PM-Timer calibration of the local APIC timer
[PATCH] clockevents: Fix suspend/resume to disk hangs
[PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode
[PATCH] hrtimer: prevent overrun DoS in hrtimer_forward()
[PATCH] Save/restore periodic tick information over suspend/resume implementations
4 fixes after -final:
2.6.21.1: -
2.6.21.2:
[PATCH] clocksource: fix resume logic
2.6.21.3: -
2.6.21.4: -
2.6.21.5:
[PATCH] NOHZ: Rate limit the local softirq pending warning output
[PATCH] Ignore bogus ACPI info for offline CPUs
[PATCH] i386: HPET, check if the counter works
2.6.21.6: -
it's all pretty quiet today on the dynticks regressions front. (there
are no open regressions in either the upstream i386 code or in the devel
patches we are aware of. Forced-HPET in -mm, which is not part of this
queue in question [but which is done for dynticks], has one open
regression.)
The majority of the above bugs were in the infrastructure code. (the
worst was the generic resume/suspend one fixed in 2.6.21.2) And sadly, a
fair number of the infrastructure bugs we introduced during the frentic
clockevents/dynticks rewrites/redesigns we did between .20 and .21. That
was a royally stupid mistake for us to do - instead of patiently waiting
for the bugs to be shaken out we destabilized the infrastructure. (it
was a "lets make this thing so nice that it's impossible to reject"
instintic gut reaction.)
In the 'weird arch bugs' category, out of the 6 i386 breakages listed
above, 'i386 legacy systems' was/is by far the worst offender: 4-5 were
on such old (not 64-bit-capable) systems. (this is not really a
surprise) While x86_64 certainly has weird crap hardware too, it
probably is an order of magnitude fewer than i386 - just due to the
sheer volume, time and diversity difference. (On the other hand if
there's crap then it will be debugged/tested slower than on 32-bit,
which offsets that advantage.)
The most prominent bugs were the ones that were in the infrastructure -
they affected many machines. (But i'd expect the infrastructure to be
pretty robust by now.)
The x86_64 hrt/dynticks code makes the x86_64 PIT driver (and hpet too)
shared between the two architectures - which is perhaps another
difference to the original i386 clockevents merge.
We also integrated _all_ feedback we got, and we had the capacity and
capability to fix whatever other feedback comes back - it just never
came ... until today.
But i fully agree with you that the cleanups should be done separately -
it's just so hard to actually hack on the old hpet code (and to
understand it to begin with) without first cleaning it up a bit so that
it does not cause permanent brain damage ;)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists