[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B3214EC.6020308@compro.net>
Date: Wed, 23 Dec 2009 08:02:36 -0500
From: Mark Hounschell <markh@...pro.net>
To: "Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>
CC: dmarkh@....rr.com, Linus Torvalds <torvalds@...ux-foundation.org>,
Alain Knaff <alain@...ff.lu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"fdutils@...tils.linux.lu" <fdutils@...tils.linux.lu>,
"Li, Shaohua" <shaohua.li@...el.com>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was:
Re: Cannot format floppies under kernel 2.6.*?)
On 12/22/2009 07:22 PM, Mark Hounschell wrote:
> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote:
>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote:
>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>>>>
>>>> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for
>>>> details, but Mark is basically chasing down a situation where the floppy
>>>> driver seems to have trouble formatting floppies, and it happened
>>>> between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a
>>>> memory block transfers the wrong value for the first byte of the block.
>>>>
>>>> Which should be impossible, but whatever. Some part of the system has a
>>>> cached buffer that isn't flushed.
>>>>
>>>> What gets _you_ guys involved is that Mark cannot reproduce the bug if
>>>> HPET is disabled in the BIOS or by using 'nohpet'. He found that out by
>>>> pure luck while bisecting, because some time during his bisect, his
>>>> machine wouldn't even boot with HPET.
>>>>
>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But
>>>> 2.6.28 (and current -git) does not. Any ideas? ]
>>>>
>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>>>>
>>>>> Ok, I may have something that might help.
>>>>>
>>>>> # git bisect bad
>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>>>>> Author: venkatesh.pallipadi@...el.com <venkatesh.pallipadi@...el.com>
>>>>> Date: Fri Sep 5 18:02:18 2008 -0700
>>>>>
>>>>> x86: HPET_MSI Initialise per-cpu HPET timers
>>>>>
>>>>> Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>>>>> setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>>>>> timer will eliminate the need for timer broadcasting with IRQ 0 when there
>>>>> is non-functional LAPIC timer across CPU deep C-states.
>>>>>
>>>>> If there are more CPUs than number of available timers, CPUs that do not
>>>>> find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>>>>
>>>>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
>>>>> Signed-off-by: Shaohua Li <shaohua.li@...el.com>
>>>>> Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>>
>>>>> And of coarse this was the first commit that I could not boot if I had hpet
>>>>> enabled. To get this one to boot (single user mode only) I had to add the
>>>>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>>>>
>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>>>>
>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>>>>> {
>>>>>
>>>>> if (request_irq(dev->irq, hpet_interrupt_handler,
>>>>> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>>>>> return -1;
>>>>>
>>>>> disable_irq(dev->irq);
>>>>>
>>>>> AND add the quiet cmdline option.
>>>>
>>>> Ok, so we know why HPET didn't boot for you, and that was fixed later (by
>>>> that 5ceb1a04). But is this also when the floppy started mis-behaving?
>>>>
>>>
>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
>>> working
>>> and also when I could no longer boot with hpet enabled.
>>
>>
>> I am missing something here. Commit 26afe5f2 is where system does not
>> boot with HPET or is it where the floppy stops working when you boot
>> with HPET enabled.
>>
>
> As it happens, both happen there. Commit 5ceb1a04 is where it starts
> booting _again_ with hpet enabled. So I took that patch (5ceb1a04) and
> applied it to (26afe5f2f) to be able to boot with hpet enabled. I had to
> use the quiet option to get to a login prompt, but there is where the
> floppy format first fails, just as it does in 2.6.28 and up.
>
>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts
>> output in each case. With that option, we should be using local APIC
>> timer and PIT, HPET or HPET with MSI should not really matter. Does it
>> still fail with .28 with that option?
>>
2.6.28 still fails with that option.
2.6.27.41 /proc/interrupts with idle=halt
CPU0 CPU1 CPU2 CPU3
0: 126 0 0 1 IO-APIC-edge timer
1: 0 0 1 157 IO-APIC-edge i8042
3: 0 0 0 6 IO-APIC-edge
4: 0 0 0 6 IO-APIC-edge
6: 0 0 0 4 IO-APIC-edge floppy
8: 0 0 0 1 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 1 128 IO-APIC-edge i8042
14: 0 0 34 4457 IO-APIC-edge
pata_atiixp
15: 0 0 4 480 IO-APIC-edge
pata_atiixp
16: 0 0 0 397 IO-APIC-fasteoi
aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
17: 0 0 0 2 IO-APIC-fasteoi
ehci_hcd:usb1
18: 0 0 0 0 IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19: 0 0 0 142 IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb2, ttySLG0, eth1
22: 0 0 4 1154 IO-APIC-fasteoi ahci
219: 0 0 3 63 PCI-MSI-edge eth0
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 91539 91964 92525 91181 Local timer interrupts
RES: 2888 3873 2434 2721 Rescheduling interrupts
CAL: 240 245 247 84 function call interrupts
TLB: 768 628 526 512 TLB shootdowns
SPU: 0 0 0 0 Spurious interrupts
ERR: 0
MIS: 0
2.6.28 /proc/interrupts with idle=halt
CPU0 CPU1 CPU2 CPU3
0: 126 0 2 0 IO-APIC-edge timer
1: 0 0 192 0 IO-APIC-edge i8042
3: 0 0 6 0 IO-APIC-edge
4: 0 0 6 0 IO-APIC-edge
6: 0 0 4 0 IO-APIC-edge floppy
8: 0 0 1 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 128 1 IO-APIC-edge i8042
14: 0 1 147114 396 IO-APIC-edge
pata_atiixp
15: 0 0 646 2 IO-APIC-edge
pata_atiixp
16: 0 0 396 0 IO-APIC-fasteoi
aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel
17: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
18: 0 0 0 0 IO-APIC-fasteoi
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19: 0 0 362 1 IO-APIC-fasteoi
aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
22: 0 0 874 1 IO-APIC-fasteoi ahci
1274: 0 0 193 4 PCI-MSI-edge eth0
1279: 513207 0 0 0 HPET_MSI-edge hpet2
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 268 513395 513138 522088 Local timer interrupts
RES: 3262 3679 2573 3746 Rescheduling interrupts
CAL: 131 166 57 147 Function call interrupts
TLB: 680 438 450 639 TLB shootdowns
SPU: 0 0 0 0 Spurious interrupts
ERR: 0
MIS: 0
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists