linux-kernel - Re: x86 status was Re: -mm merge plans for 2.6.23

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070711231912.GA32263@elte.hu>
Date:	Thu, 12 Jul 2007 01:19:12 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Arjan van de Ven <arjan@...radead.org>,
	Chris Wright <chrisw@...s-sol.org>
Subject: Re: x86 status was Re: -mm merge plans for 2.6.23

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> That was *exactly* the same thing you talked about when I refused to 
> take the original timer changes into 2.6.20. You were talking about 
> how lots of people had worked really hard, and how it was really 
> tested.

yes - i was (way too!) upset about it, and your reasoning for the 
rejection was hard (on us) but fair: you wanted a quiet 2.6.20, and you 
felt fundamentally uneasy about the patches.

> And it damn well was NOT really tested, and 2.6.21 ended up being a 
> horribly painful experience (one of the more painful kernel releases 
> in recent times), and we ended up havign to fix a *lot* of stuff.

yes. We had 12 -hrt/dynticks merge related regressions between 
2.6.21-rc1 and -final, and 4 after final. Here's a quick post-mortem:

12 fixes after -rc1:

    [PATCH] i386: Fix bogus return value in hpet_next_event()
    [PATCH] clockevents: remove bad designed sysfs support for now
    [PATCH] clocksource: Fix thinko in watchdog selection
    [PATCH] dynticks: fix hrtimer rounding error in next_timer_interrupt
    [PATCH] i386: add command line option "local_apic_timer_c2_ok"
    [PATCH] i386: disable local apic timer via command line or dmi quirk
    [PATCH] i386: clockevents fix breakage on Geode/Cyrix PIT     
    [PATCH] i386: trust the PM-Timer calibration of the local APIC timer
    [PATCH] clockevents: Fix suspend/resume to disk hangs
    [PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode
    [PATCH] hrtimer: prevent overrun DoS in hrtimer_forward()
    [PATCH] Save/restore periodic tick information over suspend/resume implementations

4 fixes after -final:

 2.6.21.1: -
 2.6.21.2:
    [PATCH] clocksource: fix resume logic
 2.6.21.3: -
 2.6.21.4: -
 2.6.21.5:
    [PATCH] NOHZ: Rate limit the local softirq pending warning output
    [PATCH] Ignore bogus ACPI info for offline CPUs
    [PATCH] i386: HPET, check if the counter works
 2.6.21.6: -

it's all pretty quiet today on the dynticks regressions front. (there 
are no open regressions in either the upstream i386 code or in the devel 
patches we are aware of. Forced-HPET in -mm, which is not part of this 
queue in question [but which is done for dynticks], has one open 
regression.)

The majority of the above bugs were in the infrastructure code. (the 
worst was the generic resume/suspend one fixed in 2.6.21.2) And sadly, a 
fair number of the infrastructure bugs we introduced during the frentic 
clockevents/dynticks rewrites/redesigns we did between .20 and .21. That 
was a royally stupid mistake for us to do - instead of patiently waiting 
for the bugs to be shaken out we destabilized the infrastructure. (it 
was a "lets make this thing so nice that it's impossible to reject" 
instintic gut reaction.)

In the 'weird arch bugs' category, out of the 6 i386 breakages listed 
above, 'i386 legacy systems' was/is by far the worst offender: 4-5 were 
on such old (not 64-bit-capable) systems. (this is not really a 
surprise) While x86_64 certainly has weird crap hardware too, it 
probably is an order of magnitude fewer than i386 - just due to the 
sheer volume, time and diversity difference. (On the other hand if 
there's crap then it will be debugged/tested slower than on 32-bit, 
which offsets that advantage.)

The most prominent bugs were the ones that were in the infrastructure - 
they affected many machines. (But i'd expect the infrastructure to be 
pretty robust by now.)

The x86_64 hrt/dynticks code makes the x86_64 PIT driver (and hpet too) 
shared between the two architectures - which is perhaps another 
difference to the original i386 clockevents merge.

We also integrated _all_ feedback we got, and we had the capacity and 
capability to fix whatever other feedback comes back - it just never 
came ... until today.

But i fully agree with you that the cleanups should be done separately - 
it's just so hard to actually hack on the old hpet code (and to 
understand it to begin with) without first cleaning it up a bit so that 
it does not cause permanent brain damage ;)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/