linux-kernel - Re: RFC: starting a kernel-testers group for newbies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4824B696.3070309@rtr.ca>
Date:	Fri, 09 May 2008 16:39:50 -0400
From:	Mark Lord <lkml@....ca>
To:	Mark Lord <lkml@....ca>,
	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Adrian Bunk <bunk@...nel.org>,
	Paul Mackerras <paulus@...ba.org>,
	Josh Boyer <jwboyer@...il.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>, davem@...emloft.net,
	linux-kernel@...r.kernel.org, jirislaby@...il.com,
	Steven Rostedt <rostedt@...dmis.org>, tglx@...utronix.de,
	Len Brown <lenb@...nel.org>
Subject: Re: RFC: starting a kernel-testers group for newbies

Carlos R. Mafra wrote:
> On Fri  9.May'08 at 12:32:51 -0400, Mark Lord wrote:
>> Pallipadi, Venkatesh wrote:
>>>  
>>>> -----Original Message-----
>>>> From: linux-kernel-owner@...r.kernel.org 
>>>> [mailto:linux-kernel-owner@...r.kernel.org] On Behalf Of Carlos R. Mafra
>>>> Sent: Friday, May 02, 2008 10:16 AM
>>>> To: Linus Torvalds
>>>> Cc: Adrian Bunk; Paul Mackerras; Josh Boyer; Arjan van de Ven; Andrew 
>>>> Morton; Rafael J. Wysocki; davem@...emloft.net; 
>>>> linux-kernel@...r.kernel.org; jirislaby@...il.com; Steven Rostedt; 
>>>> Pallipadi, Venkatesh
>>>> Subject: Re: RFC: starting a kernel-testers group for newbies
>>>>
>>>> On Fri  2.May'08 at  9:28:08 -0700, Linus Torvalds wrote:
>>>>
>>>>> Quite frankly, it does sound like the hang happens somewhere 
>>>> around the 
>>>>> 	hpet_init
>>>>> 	hpet_acpi_add
>>>>> 	hpet_resources
>>>>> 	hpet_resources: 0xfed00000 is busy
>>>>>
>>>>> printk's you added (correct?) and we've had tons of issues 
>>>> with NO_HZ, so 
>>>>> at a guess it is timer-related.
>>>> It happens a bit before that because when it hangs it doesn't print the 
>>>> above lines, and when it does not hang these lines are
>>>> the ones right after the point where it hangs. 
>>>>> (And I assume it's stable if/once it gets past that boot hang issue? 
>>>> Yes you are right. When I have luck and the boot succeeds my Sony laptop
>>>> is rock solid and the kernel is wonderful (even the card reader works!).
>>>>
>>>>> That
>>>>> tends to mean that it's not some hardware instability, it's 
>>>> literally our 
>>>>> init code).
>>>> A few days ago I found this message in lkml in reply to a hpet patch
>>>> http://lkml.org/lkml/2007/5/7/361 in which the reporter also had a 
>>>> similar hang, which was cured by hpet=disable. 
>>>> So it is in my TODO list to try to check out if that patch is in the 
>>>> current -git and whether it can be reverted somehow (I added Venki to the 
>>>> Cc: now)
>>>>
>>>> Thanks a lot for the answer!
>>> It depends on whether we are HPET is being force detected based on the
>>> chipset or whether it was exported by the BIOS in ACPI table.
>>>
>>> If it was force enabled and above patch is having any effect, then you
>>> should see a message like
>>>> Force enabled HPET at base address 0xfed00000
>>> In any case, off late there seems to be quite a few breakages that are
>>> related to HPET/timer interrupts. One of them was on a system which has
>>> HPET being exported by BIOS
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10409
>>> And the other one where we are force enabling based on chipset
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10561
>>>
>>> And then we have hangs once in a while reports by you, Roman and Mark
>>> here
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10377
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10117
>> ..
>>
>> Yeah.  This particular bug first appeared when NOHZ & HPET were added.
>> Somebody once suggested it had something to do with an SMI interrupt
>> happening in the midst of HPET calibration or some such thing.
>>
> 
> I said I was waiting for -rc1 to be released to send another email
> about my HPET problem, but curiously with v2.6.26-rc1-6-gafa26be 
> my laptop did not hang after 30+ boots and counting. 
> 
> Somewhere between 2.6.25-07000-(something) and the above kernel
> something happened which changed significantly the probability
> of hanging during boot. 
> 
> I could not boot more than 3 times in
> a row without hanging with kernels up to 2.6.25-07000 (approximately),
> and now I am still booting v2.6.26-rc1-6-gafa26be a few times a day
> and no hangs yet.
> 
> Yesterday I started a "reverse" bisection, trying to find which
> commit "fixed" it, but I still didn't finish (but it is past
> -7200).
> 
> Of course I am not sure if after the 100th boot the latest -git
> won't hang but it definitely improved.
> 
>> But nobody who works on the HPET code has ever shown more than a casual
>> interest in helping to track down and fix whatever the problem is.
> 
> Well, I would like to thank Venki for his effort because he even
> answered some private emails from me about this issue and is 
> tracking the bugzillas about it.
..

My experience with this bug, since 2.6.20 or so, has been that it comes
and goes with even the most innocent change in the .config file,
like turning frame pointers on/off.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/