lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <f350a2ee513f4e3fb2a2cfa633dc0806@EX13D11EUB003.ant.amazon.com>
Date:   Mon, 23 Mar 2020 09:05:34 +0000
From:   "Jubran, Samih" <sameehj@...zon.com>
To:     "Guilherme G. Piccoli" <gpiccoli@...onical.com>,
        "Belgazal, Netanel" <netanel@...zon.com>,
        "Kiyanovski, Arthur" <akiyano@...zon.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "Tzalik, Guy" <gtzalik@...zon.com>,
        "Bshara, Saeed" <saeedb@...zon.com>,
        "Machulsky, Zorik" <zorik@...zon.com>,
        "kernel@...ccoli.net" <kernel@...ccoli.net>,
        "gshan@...hat.com" <gshan@...hat.com>,
        "gavin.guo@...onical.com" <gavin.guo@...onical.com>,
        "jay.vosburgh@...onical.com" <jay.vosburgh@...onical.com>,
        "pedro.principeza@...onical.com" <pedro.principeza@...onical.com>
Subject: RE: [PATCH] net: ena: Add PCI shutdown handler to allow safe kexec



> -----Original Message-----
> From: netdev-owner@...r.kernel.org <netdev-owner@...r.kernel.org>
> On Behalf Of Guilherme G. Piccoli <gpiccoli@...onical.com>
> Sent: Friday, March 20, 2020 2:56 PM
> To: netanel@...zon.com; akiyano@...zon.com; netdev@...r.kernel.org
> Cc: gtzalik@...zon.com; saeedb@...zon.com; zorik@...zon.com;
> gpiccoli@...onical.com; kernel@...ccoli.net; gshan@...hat.com;
> gavin.guo@...onical.com; jay.vosburgh@...onical.com;
> pedro.principeza@...onical.com
> Subject: [PATCH] net: ena: Add PCI shutdown handler to allow safe kexec
> 
> Currently ENA only provides the PCI remove() handler, used during rmmod
> for example. This is not called on shutdown/kexec path; we are potentially
> creating a failure scenario on kexec:
> 
> (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
> instead pci_device_shutdown() clears the master bit of the PCI device,
> stopping all DMA transactions;
> 
> (b) Kexec reboot happens and the device gets enabled again, likely having its
> FW with that DMA transaction buffered; then it may trigger the (now
> invalid) memory operation in the new kernel, corrupting kernel memory
> area.
> 
> This patch aims to prevent this, by implementing a shutdown() handler quite
> similar to the remove() one - the difference being the handling of the
> netdev, which is unregistered on remove(), but following the convention
> observed in other drivers, it's only detached on shutdown().
> 
> This prevents an odd issue in AWS Nitro instances, in which after the 2nd
> kexec the next one will fail with an initrd corruption, caused by a wild DMA
> write to invalid kernel memory. The lspci output for the adapter present in
> my instance is:
> 
> 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter
> (ENA) [1d0f:ec20]
> 
> Suggested-by: Gavin Shan <gshan@...hat.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@...onical.com>
> ---
> 
> 
> The idea for this patch came from an informal conversation with my friend
> Gavin Shan, based on his past experience with similar issues.
> I'd like to thank him for the great suggestion!
> 
> As a test metric, I've performed 1000 kexecs with this patch, whereas
> without this one, the 3rd kexec failed with initrd corruption. Also, one test
> that I've done before writing the patch was just to rmmod the driver before
> the kexecs, and it worked fine too.
> 
> I suggest we add this patch in stable releases as well.
> Thanks in advance for reviews,
> 
> Guilherme
> 
> 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 ++++++++++++++++-
> ---
>  1 file changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 0b2fd96b93d7..7a5c01ff2ee8 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -4325,13 +4325,15 @@ static int ena_probe(struct pci_dev *pdev, const
> struct pci_device_id *ent)
> 
> 
> /**********************************************************
> *******************/
> 
> -/* ena_remove - Device Removal Routine
> +/* __ena_shutoff - Helper used in both PCI remove/shutdown routines
>   * @pdev: PCI device information struct
> + * @shutdown: Is it a shutdown operation? If false, means it is a
> + removal
>   *
> - * ena_remove is called by the PCI subsystem to alert the driver
> - * that it should release a PCI device.
> + * __ena_shutoff is a helper routine that does the real work on
> + shutdown and
> + * removal paths; the difference between those paths is with regards to
> + whether
> + * dettach or unregister the netdevice.
>   */
> -static void ena_remove(struct pci_dev *pdev)
> +static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
>  {
>  	struct ena_adapter *adapter = pci_get_drvdata(pdev);
>  	struct ena_com_dev *ena_dev;
> @@ -4350,13 +4352,17 @@ static void ena_remove(struct pci_dev *pdev)
> 
>  	cancel_work_sync(&adapter->reset_task);
> 
> -	rtnl_lock();
> +	rtnl_lock(); /* lock released inside the below if-else block */
>  	ena_destroy_device(adapter, true);
> -	rtnl_unlock();
> -
> -	unregister_netdev(netdev);
> -
> -	free_netdev(netdev);
> +	if (shutdown) {
> +		netif_device_detach(netdev);
> +		dev_close(netdev);
> +		rtnl_unlock();
> +	} else {
> +		rtnl_unlock();
> +		unregister_netdev(netdev);
> +		free_netdev(netdev);
> +	}
> 
>  	ena_com_rss_destroy(ena_dev);
> 
> @@ -4371,6 +4377,30 @@ static void ena_remove(struct pci_dev *pdev)
>  	vfree(ena_dev);
>  }
> 
> +/* ena_remove - Device Removal Routine
> + * @pdev: PCI device information struct
> + *
> + * ena_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device.
> + */
> +
> +static void ena_remove(struct pci_dev *pdev) {
> +	__ena_shutoff(pdev, false);
> +}
> +
> +/* ena_shutdown - Device Shutdown Routine
> + * @pdev: PCI device information struct
> + *
> + * ena_shutdown is called by the PCI subsystem to alert the driver that
> + * a shutdown/reboot (or kexec) is happening and device must be disabled.
> + */
> +
> +static void ena_shutdown(struct pci_dev *pdev) {
> +	__ena_shutoff(pdev, true);
> +}
> +
>  #ifdef CONFIG_PM
>  /* ena_suspend - PM suspend callback
>   * @pdev: PCI device information struct @@ -4420,6 +4450,7 @@ static
> struct pci_driver ena_pci_driver = {
>  	.id_table	= ena_pci_tbl,
>  	.probe		= ena_probe,
>  	.remove		= ena_remove,
> +	.shutdown	= ena_shutdown,
>  #ifdef CONFIG_PM
>  	.suspend    = ena_suspend,
>  	.resume     = ena_resume,
> --
> 2.25.1
> 


Hi 
Guilherme,

Thank you for the patch, we are currently looking into your patch and testing it.

Thanks,
Sameeh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ