lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 25 Mar 2020 01:34:19 +0530 From: Bhupesh Sharma <bhsharma@...hat.com> To: "Guilherme G. Piccoli" <gpiccoli@...onical.com>, netanel@...zon.com, akiyano@...zon.com, netdev@...r.kernel.org Cc: gtzalik@...zon.com, saeedb@...zon.com, zorik@...zon.com, kernel@...ccoli.net, gshan@...hat.com, gavin.guo@...onical.com, jay.vosburgh@...onical.com, pedro.principeza@...onical.com, "kexec@...ts.infradead.org" <kexec@...ts.infradead.org>, bhsharma@...hat.com Subject: Re: [PATCH] net: ena: Add PCI shutdown handler to allow safe kexec Hi Guilherme, On 03/20/2020 06:25 PM, Guilherme G. Piccoli wrote: > Currently ENA only provides the PCI remove() handler, used during rmmod > for example. This is not called on shutdown/kexec path; we are potentially > creating a failure scenario on kexec: > > (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA; > instead pci_device_shutdown() clears the master bit of the PCI device, > stopping all DMA transactions; > > (b) Kexec reboot happens and the device gets enabled again, likely having > its FW with that DMA transaction buffered; then it may trigger the (now > invalid) memory operation in the new kernel, corrupting kernel memory area. > > This patch aims to prevent this, by implementing a shutdown() handler > quite similar to the remove() one - the difference being the handling > of the netdev, which is unregistered on remove(), but following the > convention observed in other drivers, it's only detached on shutdown(). > > This prevents an odd issue in AWS Nitro instances, in which after the 2nd > kexec the next one will fail with an initrd corruption, caused by a wild > DMA write to invalid kernel memory. The lspci output for the adapter > present in my instance is: > > 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network > Adapter (ENA) [1d0f:ec20] Thanks for the patch. > Suggested-by: Gavin Shan <gshan@...hat.com> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@...onical.com> > --- > > > The idea for this patch came from an informal conversation with my > friend Gavin Shan, based on his past experience with similar issues. > I'd like to thank him for the great suggestion! > > As a test metric, I've performed 1000 kexecs with this patch, whereas > without this one, the 3rd kexec failed with initrd corruption. Also, > one test that I've done before writing the patch was just to rmmod > the driver before the kexecs, and it worked fine too. > > I suggest we add this patch in stable releases as well. > Thanks in advance for reviews, This patch fixes the repetitive kexec reboot issues that I was facing for some time on the aws nitro (t3) machines. Normally the kexec reboots would not runs more than ~ 3 to 5 times on the machine. Now with this patch, I can runs hundreds of repetitive nested kexec reboots on the aws nitro machines without any failure. So, I think this is a really good patch and should be applied to stable trees as well. Please feel free to add: Tested-and-Reviewed-by: Bhupesh Sharma <bhsharma@...hat.com> Thanks, Bhupesh > Guilherme > > > drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 ++++++++++++++++---- > 1 file changed, 41 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 0b2fd96b93d7..7a5c01ff2ee8 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -4325,13 +4325,15 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > > /*****************************************************************************/ > > -/* ena_remove - Device Removal Routine > +/* __ena_shutoff - Helper used in both PCI remove/shutdown routines > * @pdev: PCI device information struct > + * @shutdown: Is it a shutdown operation? If false, means it is a removal > * > - * ena_remove is called by the PCI subsystem to alert the driver > - * that it should release a PCI device. > + * __ena_shutoff is a helper routine that does the real work on shutdown and > + * removal paths; the difference between those paths is with regards to whether > + * dettach or unregister the netdevice. > */ > -static void ena_remove(struct pci_dev *pdev) > +static void __ena_shutoff(struct pci_dev *pdev, bool shutdown) > { > struct ena_adapter *adapter = pci_get_drvdata(pdev); > struct ena_com_dev *ena_dev; > @@ -4350,13 +4352,17 @@ static void ena_remove(struct pci_dev *pdev) > > cancel_work_sync(&adapter->reset_task); > > - rtnl_lock(); > + rtnl_lock(); /* lock released inside the below if-else block */ > ena_destroy_device(adapter, true); > - rtnl_unlock(); > - > - unregister_netdev(netdev); > - > - free_netdev(netdev); > + if (shutdown) { > + netif_device_detach(netdev); > + dev_close(netdev); > + rtnl_unlock(); > + } else { > + rtnl_unlock(); > + unregister_netdev(netdev); > + free_netdev(netdev); > + } > > ena_com_rss_destroy(ena_dev); > > @@ -4371,6 +4377,30 @@ static void ena_remove(struct pci_dev *pdev) > vfree(ena_dev); > } > > +/* ena_remove - Device Removal Routine > + * @pdev: PCI device information struct > + * > + * ena_remove is called by the PCI subsystem to alert the driver > + * that it should release a PCI device. > + */ > + > +static void ena_remove(struct pci_dev *pdev) > +{ > + __ena_shutoff(pdev, false); > +} > + > +/* ena_shutdown - Device Shutdown Routine > + * @pdev: PCI device information struct > + * > + * ena_shutdown is called by the PCI subsystem to alert the driver that > + * a shutdown/reboot (or kexec) is happening and device must be disabled. > + */ > + > +static void ena_shutdown(struct pci_dev *pdev) > +{ > + __ena_shutoff(pdev, true); > +} > + > #ifdef CONFIG_PM > /* ena_suspend - PM suspend callback > * @pdev: PCI device information struct > @@ -4420,6 +4450,7 @@ static struct pci_driver ena_pci_driver = { > .id_table = ena_pci_tbl, > .probe = ena_probe, > .remove = ena_remove, > + .shutdown = ena_shutdown, > #ifdef CONFIG_PM > .suspend = ena_suspend, > .resume = ena_resume, >
Powered by blists - more mailing lists