lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADLC3L1BAgQCbi1V=N0CRkJxbaQVK36q4pzbdnanO60exFem4w@mail.gmail.com>
Date:	Thu, 29 Nov 2012 20:39:41 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	Justin Piszcz <jpiszcz@...idpixels.com>,
	Bruno Prémont <bonbons@...ux-vserver.org>,
	support@...ermicro.com, linux-kernel@...r.kernel.org,
	Dan Williams <djbw@...com>
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
 bug question

On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <jpiszcz@...idpixels.com> wrote:
>>
>>
>> -----Original Message-----
>> From: Robert Hancock [mailto:hancockrwd@...il.com]
>> Sent: Wednesday, November 28, 2012 7:55 PM
>> To: Justin Piszcz
>> Cc: Bjorn Helgaas; Bruno Prémont; support@...ermicro.com;
>> linux-kernel@...r.kernel.org; Dan Williams
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>> bug question
>>
>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@...idpixels.com>
>> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:hancockrwd@...il.com]
>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>> To: Justin Piszcz
>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@...ermicro.com;
>>> linux-kernel@...r.kernel.org; 'Dan Williams'
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>> ACPI/firmware
>>> bug question
>>>
>>>
>>> What does lspci -vv show on that controller? Not sure what actual
>>> chipset that controller is, but there's a known issue with some Marvell
>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>> rightly denies access as the function listed in the requests doesn't
>>> have any mapping to that memory. I don't think there's presently a
>>> workaround other than disabling DMAR. We could (and likely should) be
>>> detecting that device and adding some kind of quirk for it.
>>>
>>> That sounds likely...
>>> It is shown below:
>>>
>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>> Adapter
>>>
>>> lspci -vv output:
>>>
>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>> controller
>>
>> Yeah, that's one of those controllers I think. But I can't tell from
>> the bit of the dmesg you posted exactly what's going on. Can you post
>> a full boot log from having the card installed and some drive attached
>> (by putting the boot drive on another controller for example)?
>>
>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>> this a Linux/ASPM implementation issue?
>>>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
>>> disabling
>>>> PCIe ASPM
>>>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>>>> mask: 0x08)
>>>
>>> What's the full dmesg from this machine (or is it already posted
>> somewhere)?
>>>
>>> It is now available here:
>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>
>>> Is that the same boot log? It doesn't have this error in it.
>>
>> Yes, the error is here: (its towards the bottom)
>>
>>  [    7.973015] ata14.00: qc timeout (cmd 0xa1)
>> [    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   51.824682] dmar: DRHD: handling fault status reg 502
>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>
> You have these devices:
>
>     pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
>     pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
>     pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>
> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
> and if you get rid of that driver, they'll probably go away.
>
> But this 84:00.1 DMAR error:
>
>     dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
>     DMAR:[fault reason 02] Present bit in context entry is clear
>
> looks like the probable cause of the Marvell issue.  It looks similar
> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
> DMAR rejects DMA that appears to be from bb:dd.1.
>
> Another report that's even more similar is
> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
> is exactly like what you're seeing.
>
> So you're not alone, but unfortunately, nobody seems to be working on
> either bug report.  I took the liberty to add you to the cc: list of
> both.
>
> I don't really know what else to do at this point.  Maybe a SATA
> expert with some Marvell docs could figure out why we're seeing DMA
> from the IDE controller, but I'm not that person :)

I doubt any Marvell docs would really be very helpful (except for
maybe an errata list but that likely would just tell us what we can
already figure out). The SATA controller part of the device seems to
just be issuing accesses with the wrong PCI function ID.

The only solution I can think of would be at the PCI/DMAR layer -
basically functions 0 and 1 on this device should be allowed to access
each other's DMA regions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ