lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20161003.232911.145888579502087608.davem@davemloft.net>
Date:   Mon, 03 Oct 2016 23:29:11 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     jeffrey.t.kirsher@...el.com
Cc:     gpiccoli@...ux.vnet.ibm.com, netdev@...r.kernel.org,
        nhorman@...hat.com, sassmann@...hat.com, jogreene@...hat.com,
        guru.anbalagane@...cle.com, stable@...r.kernel.org
Subject: Re: [net-next] i40e: avoid NULL pointer dereference and recursive
 errors on early PCI error

From: Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Date: Mon,  3 Oct 2016 00:31:12 -0700

> From: Guilherme G Piccoli <gpiccoli@...ux.vnet.ibm.com>
> 
> Although rare, it's possible to hit PCI error early on device
> probe, meaning possibly some structs are not entirely initialized,
> and some might even be completely uninitialized, leading to NULL
> pointer dereference.
> 
> The i40e driver currently presents a "bad" behavior if device hits
> such early PCI error: firstly, the struct i40e_pf might not be
> attached to pci_dev yet, leading to a NULL pointer dereference on
> access to pf->state.
> 
> Even checking if the struct is NULL and avoiding the access in that
> case isn't enough, since the driver cannot recover from PCI error
> that early; in our experiments we saw multiple failures on kernel
> log, like:
> 
>   [549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15
>   [549.664] i40e: probe of 0007:01:00.1 failed with error -15
>   [...]
>   [871.644] i40e 0007:01:00.1: The driver for the device stopped because the
>   device firmware failed to init. Try updating your NVM image.
>   [871.644] i40e: probe of 0007:01:00.1 failed with error -32
>   [...]
>   [872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored
> 
> Between the first probe failure (error -15) and the second (error -32)
> another PCI error happened due to the first bad probe. Also, driver
> started to flood console with those ARQ event messages.
> 
> This patch will prevent these issues by allowing error recovery
> mechanism to remove the failed device from the system instead of
> trying to recover from early PCI errors during device probe.
> 
> CC: <stable@...r.kernel.org>
> Signed-off-by: Guilherme G Piccoli <gpiccoli@...ux.vnet.ibm.com>
> Acked-by: Jacob Keller <jacob.e.keller@...el.com>
> Tested-by: Andrew Bowers <andrewx.bowers@...el.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@...el.com>

Applied.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ