lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 Dec 2009 10:57:54 -0500
From:	Mark Hounschell <markh@...pro.net>
To:	markh@...pro.net
CC:	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"fdutils@...tils.linux.lu" <fdutils@...tils.linux.lu>,
	"Li, Shaohua" <shaohua.li@...el.com>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was:
 Re: Cannot format floppies under kernel 2.6.*?)

On 12/23/2009 10:34 AM, Mark Hounschell wrote:
> On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote:
>>  
>>
>>> -----Original Message-----
>>> From: Mark Hounschell [mailto:markh@...pro.net] 
>>> Sent: Wednesday, December 23, 2009 5:03 AM
>>> To: Pallipadi, Venkatesh
>>> Cc: dmarkh@....rr.com; Linus Torvalds; Alain Knaff; Linux 
>>> Kernel Mailing List; fdutils@...tils.linux.lu; Li, Shaohua; Ingo Molnar
>>> Subject: Re: [Fdutils] DMA cache consistency bug introduced in 
>>> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?)
>>>
>>> On 12/22/2009 07:22 PM, Mark Hounschell wrote:
>>>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>>>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>>>>
>>>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole 
>>> thread on lkml for 
>>>>>>>   details, but Mark is basically chasing down a situation 
>>> where the floppy 
>>>>>>>   driver seems to have trouble formatting floppies, and 
>>> it happened 
>>>>>>>   between 2.6.27 and .28. The trouble seems to be that a 
>>> DMA transfer of a 
>>>>>>>   memory block transfers the wrong value for the first 
>>> byte of the block.
>>>>>>>
>>>>>>>   Which should be impossible, but whatever. Some part of 
>>> the system has a 
>>>>>>>   cached buffer that isn't flushed.
>>>>>>>
>>>>>>>   What gets _you_ guys involved is that Mark cannot 
>>> reproduce the bug if 
>>>>>>>   HPET is disabled in the BIOS or by using 'nohpet'. He 
>>> found that out by 
>>>>>>>   pure luck while bisecting, because some time during his 
>>> bisect, his 
>>>>>>>   machine wouldn't even boot with HPET.
>>>>>>>
>>>>>>>   So the problem is: with HPET enabled, 2.6.27.4 _used_ 
>>> to work. But 
>>>>>>>   2.6.28 (and current -git) does not.  Any ideas? ]
>>>>>>>
>>>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>>>>
>>>>>>>> Ok, I may have something that might help.
>>>>>>>>
>>>>>>>> # git bisect bad
>>>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>>>>> Author: venkatesh.pallipadi@...el.com 
>>> <venkatesh.pallipadi@...el.com>
>>>>>>>> Date:   Fri Sep 5 18:02:18 2008 -0700
>>>>>>>>
>>>>>>>>     x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>>>>
>>>>>>>>     Initialize a per CPU HPET MSI timer when possible. 
>>> We retain the HPET
>>>>>>>>     timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when 
>>> legacy mode is being used. We
>>>>>>>>     setup the remaining HPET timers as per CPU MSI based 
>>> timers. This per CPU
>>>>>>>>     timer will eliminate the need for timer broadcasting 
>>> with IRQ 0 when there
>>>>>>>>     is non-functional LAPIC timer across CPU deep C-states.
>>>>>>>>
>>>>>>>>     If there are more CPUs than number of available 
>>> timers, CPUs that do not
>>>>>>>>     find any timer to use will continue using LAPIC and 
>>> IRQ 0 broadcast.
>>>>>>>>
>>>>>>>>     Signed-off-by: Venkatesh Pallipadi 
>>> <venkatesh.pallipadi@...el.com>
>>>>>>>>     Signed-off-by: Shaohua Li <shaohua.li@...el.com>
>>>>>>>>     Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>>>>>
>>>>>>>> And of coarse this was the first commit that I could not 
>>> boot if I had hpet
>>>>>>>> enabled. To get this one to boot (single user mode only) 
>>> I had to add the
>>>>>>>> the quiet cmdline option and following patch from to 
>>> arch/x86/kernel/hpet.c
>>>>>>>>
>>>>>>>> commit  5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>>>>
>>>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct 
>>> hpet_dev *dev)
>>>>>>>>  {
>>>>>>>>
>>>>>>>>         if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>>>>> -                       IRQF_SHARED|IRQF_NOBALANCING, 
>>> dev->name, dev))
>>>>>>>> +                       IRQF_DISABLED|IRQF_NOBALANCING, 
>>> dev->name, dev))
>>>>>>>>                 return -1;
>>>>>>>>
>>>>>>>>         disable_irq(dev->irq);
>>>>>>>>
>>>>>>>> AND add the quiet cmdline option.
>>>>>>>
>>>>>>> Ok, so we know why HPET didn't boot for you, and that was 
>>> fixed later (by 
>>>>>>> that 5ceb1a04). But is this also when the floppy started 
>>> mis-behaving?
>>>>>>>
>>>>>>
>>>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when 
>>> the floppy stops
>>>>>> working
>>>>>> and also when I could no longer boot with hpet enabled.
>>>>>
>>>>>
>>>>> I am missing something here. Commit 26afe5f2 is where 
>>> system does not
>>>>> boot with HPET or is it where the floppy stops working when you boot
>>>>> with HPET enabled.
>>>>>
>>>>
>>>> As it happens, both happen there. Commit 5ceb1a04 is where it starts
>>>> booting _again_ with hpet enabled. So I took that patch 
>>> (5ceb1a04) and
>>>> applied it to (26afe5f2f) to be able to boot with hpet 
>>> enabled.  I had to
>>>> use the quiet option to get to a login prompt, but there is where the
>>>> floppy format first fails, just as it does in 2.6.28 and up.
>>>>
>>>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>>>>> output in each case. With that option, we should be using local APIC
>>>>> timer and PIT, HPET or HPET with MSI should not really 
>>> matter. Does it
>>>>> still fail with .28 with that option?
>>>>>
>>>
>>> 2.6.28 still fails with that option.
>>>
>>> 2.6.27.41 /proc/interrupts with idle=halt
>>>
>>>           CPU0       CPU1       CPU2       CPU3
>>>  0:        126          0          0          1   
>>> IO-APIC-edge      timer
>>>  1:          0          0          1        157   
>>> IO-APIC-edge      i8042
>>>  3:          0          0          0          6   IO-APIC-edge
>>>  4:          0          0          0          6   IO-APIC-edge
>>>  6:          0          0          0          4   
>>> IO-APIC-edge      floppy
>>>  8:          0          0          0          1   
>>> IO-APIC-edge      rtc0
>>>  9:          0          0          0          0   
>>> IO-APIC-fasteoi   acpi
>>> 12:          0          0          1        128   
>>> IO-APIC-edge      i8042
>>> 14:          0          0         34       4457   IO-APIC-edge
>>> pata_atiixp
>>> 15:          0          0          4        480   IO-APIC-edge
>>> pata_atiixp
>>> 16:          0          0          0        397   IO-APIC-fasteoi
>>> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
>>> 17:          0          0          0          2   IO-APIC-fasteoi
>>> ehci_hcd:usb1
>>> 18:          0          0          0          0   IO-APIC-fasteoi
>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>>> 19:          0          0          0        142   IO-APIC-fasteoi
>>> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
>>> 22:          0          0          4       1154   
>>> IO-APIC-fasteoi   ahci
>>> 219:          0          0          3         63   
>>> PCI-MSI-edge      eth0
>>> NMI:          0          0          0          0   
>>> Non-maskable interrupts
>>> LOC:      91539      91964      92525      91181   Local timer 
>>> interrupts
>>> RES:       2888       3873       2434       2721   
>>> Rescheduling interrupts
>>> CAL:        240        245        247         84   function 
>>> call interrupts
>>> TLB:        768        628        526        512   TLB shootdowns
>>> SPU:          0          0          0          0   Spurious interrupts
>>> ERR:          0
>>> MIS:          0
>>>
>>> 2.6.28 /proc/interrupts with idle=halt
>>>
>>>           CPU0       CPU1       CPU2       CPU3
>>>  0:        126          0          2          0   
>>> IO-APIC-edge      timer
>>>  1:          0          0        192          0   
>>> IO-APIC-edge      i8042
>>>  3:          0          0          6          0   IO-APIC-edge
>>>  4:          0          0          6          0   IO-APIC-edge
>>>  6:          0          0          4          0   
>>> IO-APIC-edge      floppy
>>>  8:          0          0          1          0   
>>> IO-APIC-edge      rtc0
>>>  9:          0          0          0          0   
>>> IO-APIC-fasteoi   acpi
>>> 12:          0          0        128          1   
>>> IO-APIC-edge      i8042
>>> 14:          0          1     147114        396   IO-APIC-edge
>>> pata_atiixp
>>> 15:          0          0        646          2   IO-APIC-edge
>>> pata_atiixp
>>> 16:          0          0        396          0   IO-APIC-fasteoi
>>> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
>>> 17:          0          0          0          0   IO-APIC-fasteoi
>>> ehci_hcd:usb1
>>> 18:          0          0          0          0   IO-APIC-fasteoi
>>> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
>>> 19:          0          0        362          1   IO-APIC-fasteoi
>>> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>>> 22:          0          0        874          1   
>>> IO-APIC-fasteoi   ahci
>>> 1274:          0          0        193          4   
>>> PCI-MSI-edge      eth0
>>> 1279:     513207          0          0          0  
>>> HPET_MSI-edge      hpet2
>>> NMI:          0          0          0          0   
>>> Non-maskable interrupts
>>> LOC:        268     513395     513138     522088   Local timer 
>>> interrupts
>>> RES:       3262       3679       2573       3746   
>>> Rescheduling interrupts
>>> CAL:        131        166         57        147   Function 
>>> call interrupts
>>> TLB:        680        438        450        639   TLB shootdowns
>>> SPU:          0          0          0          0   Spurious interrupts
>>> ERR:          0
>>> MIS:          0
>>>
>>
>> Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case.
>>
>> I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28
>> /proc/timer_list
> 
> Attached.
> 
>> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
> 
> I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine.
> Maybe because of
> 
> #
> # CPU Frequency scaling
> #
> # CONFIG_CPU_FREQ is not set
> # CONFIG_CPU_IDLE is not set
> 
> Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel?
> That kernel also fails fdformat with hpet enabled on these machines.
> 

I do have this on 2.6.32.2 though.

# grep . /sys/devices/system/cpu/cpuidle/current_*
/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:ladder

Want me to go back to 2.6.28 and show this?

Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ