lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Mar 2013 10:01:03 -0700
From:	Bjorn Helgaas <bhelgaas@...gle.com>
To:	Xiangliang Yu <yuxiangl@...vell.com>
Cc:	yxlraid <yxlraid@...il.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller

On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@...vell.com> wrote:
> Hi, Bjorn
>
>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation.  We've already read the BARs by the time
>> >> header quirks are run, so apparently it's not just the mere act of
>> >> accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here.  For example, do BARs
>> >> 0-4 exist?  Does the device decode accesses to the regions described
>> >> by the BARs?  The PCI core has to know what resources the device uses,
>> >> so if the device decodes accesses, we can't just throw away the
>> >> start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm
>> can trigger it.  I don't want to just paper over the problem until we
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.
> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.

Ah, so the problem is not with accessing the BAR in config space.  The
problem is with accessing the I/O port space mapped by the BAR.  Is
that right?

Does "udevadm info --attribute-walk" really access the device address
space mapped by the BARs?  That seems surprising to me, and I don't
see any indication of it when I try it on an AHCI device on my system:

# udevadm info --attribute-walk --path=/sys/devices/pci0000:00/0000:00:1f.2

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/devices/pci0000:00/0000:00:1f.2':
    KERNEL=="0000:00:1f.2"
    SUBSYSTEM=="pci"
    DRIVER=="ahci"
    ATTR{irq}=="40"
    ATTR{subsystem_vendor}=="0x17aa"
    ATTR{broken_parity_status}=="0"
    ATTR{class}=="0x010601"
    ATTR{consistent_dma_mask_bits}=="64"
    ATTR{dma_mask_bits}=="64"
    ATTR{local_cpus}=="00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f"
    ATTR{device}=="0x3b2f"
    ATTR{enable}=="1"
    ATTR{msi_bus}==""
    ATTR{local_cpulist}=="0-3"
    ATTR{vendor}=="0x8086"
    ATTR{subsystem_device}=="0x2168"
    ATTR{numa_node}=="-1"

  looking at parent device '/devices/pci0000:00':
    KERNELS=="pci0000:00"
    SUBSYSTEMS==""
    DRIVERS==""

>> >> > ---
>> >> >  drivers/pci/quirks.c |   15 +++++++++++++++
>> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> >> > index 0369fb6..d49f8dc 100644
>> >> > --- a/drivers/pci/quirks.c
>> >> > +++ b/drivers/pci/quirks.c
>> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >> >                                 PCI_CLASS_BRIDGE_HOST, 8,
>> >> quirk_mmio_always_on);
>> >> >
>> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> >> > +*  by IO resource file, and need to skip the files
>> >> > +*/
>> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> >> > +{
>> >> > +       int i;
>> >> > +
>> >> > +       for (i = 0; i < 5; i++)
>> >> > +               if (dev->resource[i].start)
>> >> > +                       dev->resource[i].start =
>> >> > +                               dev->resource[i].end = 0;
>> >> > +}
>> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> >> > +                               quirk_marvell_mask_bar);
>> >> > +
>> >> >  /* The Mellanox Tavor device gives false positive parity errors
>> >> >   * Mark this device with a broken_parity_status, to allow
>> >> >   * PCI scanning code to "skip" this now blacklisted device.
>> >> > --
>> >> > 1.7.5.4
>> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ