lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B310879.9050701@compro.net>
Date:	Tue, 22 Dec 2009 12:57:13 -0500
From:	Mark Hounschell <markh@...pro.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Mark Hounschell <dmarkh@....rr.com>, Alain Knaff <alain@...ff.lu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	fdutils@...tils.linux.lu,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Shaohua Li <shaohua.li@...el.com>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was:
 Re: Cannot format floppies under kernel 2.6.*?)

On 12/22/2009 12:38 PM, Linus Torvalds wrote:
> 
> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for 
>   details, but Mark is basically chasing down a situation where the floppy 
>   driver seems to have trouble formatting floppies, and it happened 
>   between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a 
>   memory block transfers the wrong value for the first byte of the block.
> 
>   Which should be impossible, but whatever. Some part of the system has a 
>   cached buffer that isn't flushed.
> 
>   What gets _you_ guys involved is that Mark cannot reproduce the bug if 
>   HPET is disabled in the BIOS or by using 'nohpet'. He found that out by 
>   pure luck while bisecting, because some time during his bisect, his 
>   machine wouldn't even boot with HPET.
> 
>   So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But 
>   2.6.28 (and current -git) does not.  Any ideas? ]
> 
> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>
>> Ok, I may have something that might help.
>>
>> # git bisect bad
>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>> Author: venkatesh.pallipadi@...el.com <venkatesh.pallipadi@...el.com>
>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>
>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>
>>     Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>>     setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>>     timer will eliminate the need for timer broadcasting with IRQ 0 when there
>>     is non-functional LAPIC timer across CPU deep C-states.
>>
>>     If there are more CPUs than number of available timers, CPUs that do not
>>     find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>
>>     Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
>>     Signed-off-by: Shaohua Li <shaohua.li@...el.com>
>>     Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>
>> And of coarse this was the first commit that I could not boot if I had hpet
>> enabled. To get this one to boot (single user mode only) I had to add the
>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>
>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>
>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>>  {
>>
>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>> -                       IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>> +                       IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>>                 return -1;
>>
>>         disable_irq(dev->irq);
>>
>> AND add the quiet cmdline option.
> 
> Ok, so we know why HPET didn't boot for you, and that was fixed later (by 
> that 5ceb1a04). But is this also when the floppy started mis-behaving?
> 

Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
working
and also when I could no longer boot with hpet enabled. Commit 5ceb1a04 is
where I found I could boot again with the hpet enabled. It was a simple
patch so backed it into where I was
in order to be able to boot with hpet on.  I did 2 different bisects. First
to find out when I could boot again with hpet on, then the next to find
which caused the floppy problem. Using the patch from the first bisect
(5ceb1a04) while doing the second bisect.

> IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet 
> option - I wonder what that is about: do you have any ideas?), is the 
> per-CPU HPET timer commit also the commit that causes floppy problems, or 
> is this purely a "bisect when HPET became a boot-up problem"?
> 

The quiet option was only needed because with that 5ceb1a04 commit applied
to the kernels I was interested in, kernel messages of some kind went on
for hours and I could not get a login prompt. They went by so fast and I
didn't have a serial console available to see them.
They must not have too important or critical because the machine acted as
normal as any machine in single user mode.

But once I got to a single user login prompt it was for sure the same
floppy problem.

> 
> ---
>> Also, of all the machines it does work on with hpets enabled, I don't see
>> the HPET2 in /proc/interupts as below.
>>
>>
>> cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   0:         82          0          3          0   IO-APIC-edge      timer
>>   1:          0          0       1712          6   IO-APIC-edge      i8042
>>   3:          0          0          6          0   IO-APIC-edge
>>   4:          0          0          6          0   IO-APIC-edge
>>   6:          0          0          4          0   IO-APIC-edge      floppy
>>   8:          0          0         60          0   IO-APIC-edge      rtc0
>>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>>  12:          0          0      37798        179   IO-APIC-edge      i8042
>>  14:          0          0      16462         71   IO-APIC-edge      pata_atiixp
>>  15:          0          0       5713         17   IO-APIC-edge      pata_atiixp
>>  16:          0          0        904          2   IO-APIC-fasteoi   aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib
>>  17:          0          0          2          0   IO-APIC-fasteoi   ehci_hcd:usb1, parport0, ni-pci-gpib
>>  18:          0          0      49940         90   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia
>>  19:          0          0        703          2   IO-APIC-fasteoi   aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>>  22:          0          0       1303         15   IO-APIC-fasteoi   ahci
>>
>>  24:     261763          0          0          0  HPET_MSI-edge      hpet2
>>
>>  29:          0          0        220          5   PCI-MSI-edge      sky2@pci:0000:04:00.0
>> NMI:          0          0          0          0   Non-maskable interrupts
>> LOC:        138     271356     264446     261050   Local timer interrupts
>> SPU:          0          0          0          0   Spurious interrupts
>> PMI:          0          0          0          0   Performance monitoring interrupts
>> PND:          0          0          0          0   Performance pending work
>> RES:       4511       9275       8470       8086   Rescheduling interrupts
>> CAL:       3624       8666        523       4543   Function call interrupts
>> TLB:        981       1111       1065       1058   TLB shootdowns
>> ERR:          0
>> MIS:          0
> 


Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ