[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B310879.9050701@compro.net>
Date: Tue, 22 Dec 2009 12:57:13 -0500
From: Mark Hounschell <markh@...pro.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Mark Hounschell <dmarkh@....rr.com>, Alain Knaff <alain@...ff.lu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
fdutils@...tils.linux.lu,
Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
Shaohua Li <shaohua.li@...el.com>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [Fdutils] DMA cache consistency bug introduced in 2.6.28 (Was:
Re: Cannot format floppies under kernel 2.6.*?)
On 12/22/2009 12:38 PM, Linus Torvalds wrote:
>
> [ Ingo, Venki and Shaohua added to cc: see the whole thread on lkml for
> details, but Mark is basically chasing down a situation where the floppy
> driver seems to have trouble formatting floppies, and it happened
> between 2.6.27 and .28. The trouble seems to be that a DMA transfer of a
> memory block transfers the wrong value for the first byte of the block.
>
> Which should be impossible, but whatever. Some part of the system has a
> cached buffer that isn't flushed.
>
> What gets _you_ guys involved is that Mark cannot reproduce the bug if
> HPET is disabled in the BIOS or by using 'nohpet'. He found that out by
> pure luck while bisecting, because some time during his bisect, his
> machine wouldn't even boot with HPET.
>
> So the problem is: with HPET enabled, 2.6.27.4 _used_ to work. But
> 2.6.28 (and current -git) does not. Any ideas? ]
>
> On Tue, 22 Dec 2009, Mark Hounschell wrote:
>>
>> Ok, I may have something that might help.
>>
>> # git bisect bad
>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit
>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0
>> Author: venkatesh.pallipadi@...el.com <venkatesh.pallipadi@...el.com>
>> Date: Fri Sep 5 18:02:18 2008 -0700
>>
>> x86: HPET_MSI Initialise per-cpu HPET timers
>>
>> Initialize a per CPU HPET MSI timer when possible. We retain the HPET
>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
>> setup the remaining HPET timers as per CPU MSI based timers. This per CPU
>> timer will eliminate the need for timer broadcasting with IRQ 0 when there
>> is non-functional LAPIC timer across CPU deep C-states.
>>
>> If there are more CPUs than number of available timers, CPUs that do not
>> find any timer to use will continue using LAPIC and IRQ 0 broadcast.
>>
>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
>> Signed-off-by: Shaohua Li <shaohua.li@...el.com>
>> Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>
>> And of coarse this was the first commit that I could not boot if I had hpet
>> enabled. To get this one to boot (single user mode only) I had to add the
>> the quiet cmdline option and following patch from to arch/x86/kernel/hpet.c
>>
>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a
>>
>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct hpet_dev *dev)
>> {
>>
>> if (request_irq(dev->irq, hpet_interrupt_handler,
>> - IRQF_SHARED|IRQF_NOBALANCING, dev->name, dev))
>> + IRQF_DISABLED|IRQF_NOBALANCING, dev->name, dev))
>> return -1;
>>
>> disable_irq(dev->irq);
>>
>> AND add the quiet cmdline option.
>
> Ok, so we know why HPET didn't boot for you, and that was fixed later (by
> that 5ceb1a04). But is this also when the floppy started mis-behaving?
>
Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when the floppy stops
working
and also when I could no longer boot with hpet enabled. Commit 5ceb1a04 is
where I found I could boot again with the hpet enabled. It was a simple
patch so backed it into where I was
in order to be able to boot with hpet on. I did 2 different bisects. First
to find out when I could boot again with hpet on, then the next to find
which caused the floppy problem. Using the patch from the first bisect
(5ceb1a04) while doing the second bisect.
> IOW, _if_ you boot with that fix from commit 5ceb1a04 (and the quiet
> option - I wonder what that is about: do you have any ideas?), is the
> per-CPU HPET timer commit also the commit that causes floppy problems, or
> is this purely a "bisect when HPET became a boot-up problem"?
>
The quiet option was only needed because with that 5ceb1a04 commit applied
to the kernels I was interested in, kernel messages of some kind went on
for hours and I could not get a login prompt. They went by so fast and I
didn't have a serial console available to see them.
They must not have too important or critical because the machine acted as
normal as any machine in single user mode.
But once I got to a single user login prompt it was for sure the same
floppy problem.
>
> ---
>> Also, of all the machines it does work on with hpets enabled, I don't see
>> the HPET2 in /proc/interupts as below.
>>
>>
>> cat /proc/interrupts
>> CPU0 CPU1 CPU2 CPU3
>> 0: 82 0 3 0 IO-APIC-edge timer
>> 1: 0 0 1712 6 IO-APIC-edge i8042
>> 3: 0 0 6 0 IO-APIC-edge
>> 4: 0 0 6 0 IO-APIC-edge
>> 6: 0 0 4 0 IO-APIC-edge floppy
>> 8: 0 0 60 0 IO-APIC-edge rtc0
>> 9: 0 0 0 0 IO-APIC-fasteoi acpi
>> 12: 0 0 37798 179 IO-APIC-edge i8042
>> 14: 0 0 16462 71 IO-APIC-edge pata_atiixp
>> 15: 0 0 5713 17 IO-APIC-edge pata_atiixp
>> 16: 0 0 904 2 IO-APIC-fasteoi aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel, ni-pci-gpib
>> 17: 0 0 2 0 IO-APIC-fasteoi ehci_hcd:usb1, parport0, ni-pci-gpib
>> 18: 0 0 49940 90 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, nvidia
>> 19: 0 0 703 2 IO-APIC-fasteoi aic7xxx, ehci_hcd:usb3, ttySLG0, eth1
>> 22: 0 0 1303 15 IO-APIC-fasteoi ahci
>>
>> 24: 261763 0 0 0 HPET_MSI-edge hpet2
>>
>> 29: 0 0 220 5 PCI-MSI-edge sky2@pci:0000:04:00.0
>> NMI: 0 0 0 0 Non-maskable interrupts
>> LOC: 138 271356 264446 261050 Local timer interrupts
>> SPU: 0 0 0 0 Spurious interrupts
>> PMI: 0 0 0 0 Performance monitoring interrupts
>> PND: 0 0 0 0 Performance pending work
>> RES: 4511 9275 8470 8086 Rescheduling interrupts
>> CAL: 3624 8666 523 4543 Function call interrupts
>> TLB: 981 1111 1065 1058 TLB shootdowns
>> ERR: 0
>> MIS: 0
>
Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists