lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=F19SdRNfhQe_R9sqwqCbFtAUCFu+09KFgzdAK@mail.gmail.com>
Date:	Fri, 15 Oct 2010 11:26:08 -0700
From:	Emil S Tantilov <emils.tantilov@...il.com>
To:	Ben Hutchings <bhutchings@...arflare.com>
Cc:	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Michael Chan <mchan@...adcom.com>,
	Matthew Wilcox <matthew@....cx>, linux-pci@...r.kernel.org,
	NetDev <netdev@...r.kernel.org>,
	"Tantilov, Emil S" <emil.s.tantilov@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>
Subject: Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access

On Thu, Jun 17, 2010 at 12:16 PM, Ben Hutchings
<bhutchings@...arflare.com> wrote:
> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
>
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
>
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
>
> Signed-off-by: Ben Hutchings <bhutchings@...arflare.com>
> ---
> On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote:
>> I'm debugging the bnx2 driver which doesn't work after suspend/resume if
>> it is running in MSI-X mode.  The problem is that during suspend, the
>> MSI-X vectors are disabled by the following sequence on x86:
>>
>> take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
>>
>> The MSI-X address/data used to disable the vectors are remembered in the
>> above sequence. During resume, these address/data are then programmed
>> back to the device during pci_restore_state(), causing all the vectors
>> to remain disabled.
>
> That's not quite what I see.  What I see is that the message is read
> back from the table *after* the driver's suspend method has been called.
> At this point the device is already in D3 and memory-mapped registers
> are not accessible, so we get random bits as the message.  At least,
> that's what I see happening with the sfc driver.
>
>> Some drivers call free_irq() during suspend and request_irq() during
>> resume, and that should avoid the problem.  bnx2 and some other drivers
>> do not do that.  These drivers rely on pci_restore_state() to restore
>> the MSI-X vectors to the same working state before suspend.
>>
>> What's the right way to fix this?  Thanks.
>
> This is my attempt, which works for sfc.  See if it works for bnx2.
>
> Ben.
>
>  drivers/pci/msi.c |   34 +++++++++++-----------------------
>  1 files changed, 11 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 77b68ea..03f04dc 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
>  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>        struct msi_desc *entry = get_irq_desc_msi(desc);
> -       if (entry->msi_attrib.is_msix) {
> -               void __iomem *base = entry->mask_base +
> -                       entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
>
> -               msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
> -               msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
> -               msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
> -       } else {
> -               struct pci_dev *dev = entry->dev;
> -               int pos = entry->msi_attrib.pos;
> -               u16 data;
> +       /* We do not touch the hardware (which may not even be
> +        * accessible at the moment) but return the last message
> +        * written.  Assert that this is valid, assuming that
> +        * valid messages are not all-zeroes. */
> +       BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> +                entry->msg.data));
>
> -               pci_read_config_dword(dev, msi_lower_address_reg(pos),
> -                                       &msg->address_lo);
> -               if (entry->msi_attrib.is_64) {
> -                       pci_read_config_dword(dev, msi_upper_address_reg(pos),
> -                                               &msg->address_hi);
> -                       pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
> -               } else {
> -                       msg->address_hi = 0;
> -                       pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
> -               }
> -               msg->data = data;
> -       }
> +       *msg = entry->msg;
>  }
>
>  void read_msi_msg(unsigned int irq, struct msi_msg *msg)
> @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
>  void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>        struct msi_desc *entry = get_irq_desc_msi(desc);
> -       if (entry->msi_attrib.is_msix) {
> +
> +       if (entry->dev->current_state != PCI_D0) {

This check exposed a problem in ixgb (patch is on the way) where
pci_disable_device() was not being called in ixgb_remove(). As a
result the current_state was set to PCI_UNKNOWN and the interface
failed to work on subsequent load of the driver.

Even though the problem was in ixgb, it made me wonder about this
check as the presumption here (low power state) may not always be
true. Like in the case of unloading a driver, which sets
dev->current_state to PCI_UNKNOWN which is not a representation of the
_real_ state of the device (actual state could be D0).

BTW - quick search shows other drivers that could potentially suffer
the faith of ixgb due to lack of pci_disable_device() call on removal.

Thanks,
Emil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ