lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 Dec 2009 10:34:03 -0500
From:	Mark Hounschell <markh@...pro.net>
To:	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>
CC:	"dmarkh@....rr.com" <dmarkh@....rr.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alain Knaff <alain@...ff.lu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"fdutils@...tils.linux.lu" <fdutils@...tils.linux.lu>,
	"Li, Shaohua" <shaohua.li@...el.com>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was:
 Re: Cannot format floppies under kernel 2.6.*?)

On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote:
>  
> 
>> -----Original Message-----
>> From: Mark Hounschell [mailto:markh@...pro.net] 
>> Sent: Wednesday, December 23, 2009 5:03 AM
>> To: Pallipadi, Venkatesh
>> Cc: dmarkh@....rr.com; Linus Torvalds; Alain Knaff; Linux 
>> Kernel Mailing List; fdutils@...tils.linux.lu; Li, Shaohua; Ingo Molnar
>> Subject: Re: [Fdutils] DMA cache consistency bug introduced in 
>> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
>>
>> On 12/22/2009 07:22 PM, Mark Hounschell wrote:
>>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>>>
>>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole 
>> thread on lkml for 
>>>>>>   details, but Mark is basically chasing down a situation 
>> where the floppy 
>>>>>>   driver seems to have trouble formatting floppies, and 
>> it happened 
>>>>>>   between 2.6.27 and .28. The trouble seems to be that a 
>> DMA transfer of a 
>>>>>>   memory block transfers the wrong value for the first 
>> byte of the block.
>>>>>>
>>>>>>   Which should be impossible, but whatever. Some part of 
>> the system has a 
>>>>>>   cached buffer that isn't flushed.
>>>>>>
>>>>>>   What gets _you_ guys involved is that Mark cannot 
>> reproduce the bug if 
>>>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He 
>> found that out by 
>>>>>>   pure luck while bisecting, because some time during his 
>> bisect, his 
>>>>>>   machine wouldn't even boot with HPET.
>>>>>>
>>>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ 
>> to work. But 
>>>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>>>
>>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>>>
>>>>>>> Ok, I may have something that might help.
>>>>>>>
>>>>>>> # git bisect bad
>>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>>>> Author: venkatesh.pallipadi@...el.com 
>> <venkatesh.pallipadi@...el.com>
>>>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>>>
>>>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>>>
>>>>>>>     Initialize a per CPU HPET MSI timer when possible. 
>> We retain the HPET
>>>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when 
>> legacy mode is being used. We
>>>>>>>     setup the remaining HPET timers as per CPU MSI based 
>> timers. This per CPU
>>>>>>>     timer will eliminate the need for timer broadcasting 
>> with IRQ 0 when there
>>>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>>>
>>>>>>>     If there are more CPUs than number of available 
>> timers, CPUs that do not
>>>>>>>     find any timer to use will continue using LAPIC and 
>> IRQ 0 broadcast.
>>>>>>>
>>>>>>>     Signed-off-by: Venkatesh Pallipadi 
>> <venkatesh.pallipadi@...el.com>
>>>>>>>     Signed-off-by: Shaohua Li <shaohua.li@...el.com>
>>>>>>>     Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>>>>
>>>>>>> And of coarse this was the first commit that I could not 
>> boot if I had hpet
>>>>>>> enabled. To get this one to boot (single user mode only) 
>> I had to add the
>>>>>>> the quiet cmdline option and following patch from to 
>> arch/x86/kernel/hpet.c
>>>>>>>
>>>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>>>
>>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct 
>> hpet_dev *dev)
>>>>>>>  {
>>>>>>>
>>>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, 
>> dev->name, dev))
>>>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, 
>> dev->name, dev))
>>>>>>>                 return -1;
>>>>>>>
>>>>>>>         disable_irq(dev->irq);
>>>>>>>
>>>>>>> AND add the quiet cmdline option.
>>>>>>
>>>>>> Ok, so we know why HPET didn't boot for you, and that was 
>> fixed later (by 
>>>>>> that 5ceb1a04). But is this also when the floppy started 
>> mis-behaving?
>>>>>>
>>>>>
>>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when 
>> the floppy stops
>>>>> working
>>>>> and also when I could no longer boot with hpet enabled.
>>>>
>>>>
>>>> I am missing something here. Commit 26afe5f2 is where 
>> system does not
>>>> boot with HPET or is it where the floppy stops working when you boot
>>>> with HPET enabled.
>>>>
>>>
>>> As it happens, both happen there. Commit 5ceb1a04 is where it starts
>>> booting _again_ with hpet enabled. So I took that patch 
>> (5ceb1a04) and
>>> applied it to (26afe5f2f) to be able to boot with hpet 
>> enabled.  I had to
>>> use the quiet option to get to a login prompt, but there is where the
>>> floppy format first fails, just as it does in 2.6.28 and up.
>>>
>>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>>>> output in each case. With that option, we should be using local APIC
>>>> timer and PIT, HPET or HPET with MSI should not really 
>> matter. Does it
>>>> still fail with .28 with that option?
>>>>
>>
>> 2.6.28 still fails with that option.
>>
>> 2.6.27.41 /proc/interrupts with idle=halt
>>
>>           CPU0       CPU1       CPU2       CPU3
>>  0:        126          0          0          1   
>> IO-APIC-edge      timer
>>  1:          0          0          1        157   
>> IO-APIC-edge      i8042
>>  3:          0          0          0          6   IO-APIC-edge
>>  4:          0          0          0          6   IO-APIC-edge
>>  6:          0          0          0          4   
>> IO-APIC-edge      floppy
>>  8:          0          0          0          1   
>> IO-APIC-edge      rtc0
>>  9:          0          0          0          0   
>> IO-APIC-fasteoi   acpi
>> 12:          0          0          1        128   
>> IO-APIC-edge      i8042
>> 14:          0          0         34       4457   IO-APIC-edge
>> pata_atiixp
>> 15:          0          0          4        480   IO-APIC-edge
>> pata_atiixp
>> 16:          0          0          0        397   IO-APIC-fasteoi
>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
>> 17:          0          0          0          2   IO-APIC-fasteoi
>> ehci_hcd:usb1
>> 18:          0          0          0          0   IO-APIC-fasteoi
>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>> 19:          0          0          0        142   IO-APIC-fasteoi
>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>> 22:          0          0          4       1154   
>> IO-APIC-fasteoi   ahci
>> 219:          0          0          3         63   
>> PCI-MSI-edge      eth0
>> NMI:          0          0          0          0   
>> Non-maskable interrupts
>> LOC:      91539      91964      92525      91181   Local timer 
>> interrupts
>> RES:       2888       3873       2434       2721   
>> Rescheduling interrupts
>> CAL:        240        245        247         84   function 
>> call interrupts
>> TLB:        768        628        526        512   TLB shootdowns
>> SPU:          0          0          0          0   Spurious interrupts
>> ERR:          0
>> MIS:          0
>>
>> 2.6.28 /proc/interrupts with idle=halt
>>
>>           CPU0       CPU1       CPU2       CPU3
>>  0:        126          0          2          0   
>> IO-APIC-edge      timer
>>  1:          0          0        192          0   
>> IO-APIC-edge      i8042
>>  3:          0          0          6          0   IO-APIC-edge
>>  4:          0          0          6          0   IO-APIC-edge
>>  6:          0          0          4          0   
>> IO-APIC-edge      floppy
>>  8:          0          0          1          0   
>> IO-APIC-edge      rtc0
>>  9:          0          0          0          0   
>> IO-APIC-fasteoi   acpi
>> 12:          0          0        128          1   
>> IO-APIC-edge      i8042
>> 14:          0          1     147114        396   IO-APIC-edge
>> pata_atiixp
>> 15:          0          0        646          2   IO-APIC-edge
>> pata_atiixp
>> 16:          0          0        396          0   IO-APIC-fasteoi
>> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
>> 17:          0          0          0          0   IO-APIC-fasteoi
>> ehci_hcd:usb1
>> 18:          0          0          0          0   IO-APIC-fasteoi
>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>> 19:          0          0        362          1   IO-APIC-fasteoi
>> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>> 22:          0          0        874          1   
>> IO-APIC-fasteoi   ahci
>> 1274:          0          0        193          4   
>> PCI-MSI-edge      eth0
>> 1279:     513207          0          0          0  
>> HPET_MSI-edge      hpet2
>> NMI:          0          0          0          0   
>> Non-maskable interrupts
>> LOC:        268     513395     513138     522088   Local timer 
>> interrupts
>> RES:       3262       3679       2573       3746   
>> Rescheduling interrupts
>> CAL:        131        166         57        147   Function 
>> call interrupts
>> TLB:        680        438        450        639   TLB shootdowns
>> SPU:          0          0          0          0   Spurious interrupts
>> ERR:          0
>> MIS:          0
>>
> 
> Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case.
> 
> I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28
> /proc/timer_list

Attached.

> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine.
Maybe because of

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# CONFIG_CPU_IDLE is not set

Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel?
That kernel also fails fdformat with hpet enabled on these machines.

Thanks
Mark

View attachment "timer_list.txt" of type "text/plain" (7902 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ