[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4C1912D2.8000408@athenacr.com>
Date: Wed, 16 Jun 2010 14:07:14 -0400
From: Brian Bloniarz <bmb@...enacr.com>
To: Bjorn Helgaas <bjorn.helgaas@...com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.35-rc3 BUG: unable to handle kernel paging request (ahci_stop_engine)
On 06/16/2010 12:57 PM, Bjorn Helgaas wrote:
> On Tuesday, June 15, 2010 04:24:38 pm Brian Bloniarz wrote:
>> On 06/15/2010 03:11 PM, Brian Bloniarz wrote:
>>> I'm seeing the following BUG booting a Dell Precision T3500
>>> with 2.6.35-rc3 -- does this ring any bells for anyone?
>>>
>>> Looks like -rc1 has the same behavior, I haven't gotten any
>>> farther than that yet.
>>
>> 2.6.34 does not boot for me on this machine either, it times
>> out waiting for the boot device. However, it doesn't BUG.
>> I'm wondering if there are two issues, some issue which
>> showed up pre 2.6.34 causing this:
>>
>> [ 5.854464] ahci 0000:00:1f.2: controller reset failed (0xffffffff)
>>
>> and then something post-2.6.34 which triggers the BUG.
>
> Yes, it sounds like this may be two separate issues, but both
> could be regressions, and we definitely want to resolve them.
> Thanks for giving me a heads-up!
>
> I assume there is *some* older kernel that works. If so, can
> you open a report at http://bugzilla.kernel.org that mentions
> the working older revision and the broken new one, and attach
> the dmesg logs for both?
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=16228
and attached the boot logs.
2.6.33 works fine, and 2.6.35-rc3 with pci=nocrs works
fine too. The logs for both of those are included on the bug.
I don't have windows on this machine unfortunately.
Thanks for the help!
>
>> Googling for "controller reset failed" gives this:
>> https://bugzilla.kernel.org/show_bug.cgi?id=15744
>> on a similar machine, but that was fixed before 2.6.34.
>> Bjorn, could you tell me if this boot log shows anything
>> similar to the behavior you describe in that bug link?
>
> The symptoms are similar to 15744, but I think you're seeing something
> a bit different. Here's what you see:
>
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
> pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
> pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
> pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfc000000]
> pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
> pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]
> pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff]
>
> The BIOS left the device set to an address that isn't within any of
> the host bridge windows, so we moved it:
>
> pci 0000:00:1f.2: BAR 5: assigned [mem 0xbff00000-0xbff007ff]
> pci 0000:00:1f.2: BAR 5: set to [mem 0xbff00000-0xbff007ff] (PCI address [0xbff00000-0xbff007ff]
>
> The new address (0xbff00000) is inside one of the windows and looks
> reasonable. If you booted Windows on this system, I think it would
> also move the device, though it would probably pick a different
> place to put it.
>
> ahci 0000:00:1f.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
> ahci 0000:00:1f.2: controller can't do SNTF, turning off CAP_SNTF
> ahci 0000:00:1f.2: controller reset failed (0xffffffff)
>
> The device seems to be responding there (we read the IRQ information,
> for example), so I don't see a problem from the PCI side yet, but
> something is still wrong.
>
> It's conceivable that booting with "pci=nocrs" would make a difference.
> If so, please collect the dmesg log so I can see where we went wrong.
>
> The BUG:
>
> ahci 0000:00:1f.2: failed to stop engine (-5)
> BUG: unable to handle kernel paging request at ffffc90012621018
> IP: [<ffffffffa002c77c>] ahci_stop_engine+0x2c/0x70 [libahci]
>
> looks very strange to me. ahci_stop_engine() does a read from the
> device, then a write, and it looks like the page fault was on the
> write to the same address we just read. I don't know enough about
> x86 to go any farther yet.
>
> Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists