[<prev] [next>] [day] [month] [year] [list]
Message-ID: <13726ba604c5439cb2a88bad83d7dec6@EX13D11EUB003.ant.amazon.com>
Date: Tue, 24 Mar 2020 17:02:49 +0000
From: "Jubran, Samih" <sameehj@...zon.com>
To: David Miller <davem@...emloft.net>,
"gpiccoli@...onical.com" <gpiccoli@...onical.com>
CC: "Belgazal, Netanel" <netanel@...zon.com>,
"Kiyanovski, Arthur" <akiyano@...zon.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Tzalik, Guy" <gtzalik@...zon.com>,
"Bshara, Saeed" <saeedb@...zon.com>,
"Machulsky, Zorik" <zorik@...zon.com>,
"kernel@...ccoli.net" <kernel@...ccoli.net>,
"gshan@...hat.com" <gshan@...hat.com>,
"gavin.guo@...onical.com" <gavin.guo@...onical.com>,
"jay.vosburgh@...onical.com" <jay.vosburgh@...onical.com>,
"pedro.principeza@...onical.com" <pedro.principeza@...onical.com>
Subject: RE: Re: [PATCH] net: ena: Add PCI shutdown handler to allow safe
kexec
> -----Original Message-----
> From: netdev-owner@...r.kernel.org <netdev-owner@...r.kernel.org>
> On Behalf Of David Miller <davem@...emloft.net>
> Sent: Tuesday, March 24, 2020 6:05 AM
> To: gpiccoli@...onical.com
> Cc: netanel@...zon.com; akiyano@...zon.com; netdev@...r.kernel.org;
> gtzalik@...zon.com; saeedb@...zon.com; zorik@...zon.com;
> kernel@...ccoli.net; gshan@...hat.com; gavin.guo@...onical.com;
> jay.vosburgh@...onical.com; pedro.principeza@...onical.com
> Subject: Re: [PATCH] net: ena: Add PCI shutdown handler to allow safe kexec
>
> From: "Guilherme G. Piccoli" <gpiccoli@...onical.com>
> Date: Fri, 20 Mar 2020 09:55:34 -0300
>
> > Currently ENA only provides the PCI remove() handler, used during
> > rmmod for example. This is not called on shutdown/kexec path; we are
> > potentially creating a failure scenario on kexec:
> >
> > (a) Kexec is triggered, no shutdown() / remove() handler is called for
> > ENA; instead pci_device_shutdown() clears the master bit of the PCI
> > device, stopping all DMA transactions;
> >
> > (b) Kexec reboot happens and the device gets enabled again, likely
> > having its FW with that DMA transaction buffered; then it may trigger
> > the (now
> > invalid) memory operation in the new kernel, corrupting kernel memory
> area.
> >
> > This patch aims to prevent this, by implementing a shutdown() handler
> > quite similar to the remove() one - the difference being the handling
> > of the netdev, which is unregistered on remove(), but following the
> > convention observed in other drivers, it's only detached on shutdown().
> >
> > This prevents an odd issue in AWS Nitro instances, in which after the
> > 2nd kexec the next one will fail with an initrd corruption, caused by
> > a wild DMA write to invalid kernel memory. The lspci output for the
> > adapter present in my instance is:
> >
> > 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
> > Adapter (ENA) [1d0f:ec20]
> >
> > Suggested-by: Gavin Shan <gshan@...hat.com>
> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@...onical.com>
>
> Amazon folks, please review.
The patch is still under review we will reply as soon as we have finished testing it,
Thanks
Powered by blists - more mailing lists