[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250423173954.GA441200@bhelgaas>
Date: Wed, 23 Apr 2025 12:39:54 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, nic_swsd@...ltek.com,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, Niklas Cassel <cassel@...nel.org>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Subject: Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
On Sun, Apr 20, 2025 at 08:25:19AM +0200, Heiner Kallweit wrote:
> On 20.04.2025 01:18, Krishna Chaitanya Chundru wrote:
> > On 4/19/2025 3:48 PM, Heiner Kallweit wrote:
> >> On 18.04.2025 23:52, Heiner Kallweit wrote:
> >>> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
> >>>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
> >>>>> Hello Krishna Chaitanya,
> >>>>>
> >>>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> >>>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> >>>>>>>
> >>>>>>> So perhaps we should hold off with this patch.
> >>>>>>>
> >>>>>> I disagree on this, it might be causing issue with net
> >>>>>> driver, but we might face some other issues as explained
> >>>>>> above if we don't call pci_stop_root_bus().
> >>>>>
> >>>>> When I wrote hold off with this patch, I meant the patch in
> >>>>> $subject, not your patch.
> >>>>>
> >>>>>
> >>>>> When it comes to your patch, I think that the commit log needs
> >>>>> to explain why it is so special.
> >>>>>
> >>>>> Because AFAICT, all other PCIe controller drivers call
> >>>>> pci_stop_root_bus() in the .remove() callback, not in the
> >>>>> .shutdown() callback.
> >>>>
> >>>> And this driver is special because, it has no 'remove()'
> >>>> callback as it implements an irqchip controller. So we have to
> >>>> shutdown the devices cleanly in the 'shutdown' callback.
> >>>>
> >>> Doing proper cleanup in a first place is responsibility of the
> >>> respective bus devices (in their shutdown() callback).
> >>>
> >>> Calling pci_stop_root_bus() in the pci controllers shutdown()
> >>> causes the bus devices to be removed, hence their remove() is
> >>> called. Typically devices don't expect that remove() is called
> >>> after shutdown(). This may cause issues if e.g. shutdown() sets
> >>> a device to D3cold. remove() won't expect that device is
> >>> inaccessible.
> >
> > Lets say controller driver in the shut down callback keeps PCIe
> > device state in D3cold without removing PCIe devices. Then the
> > PCIe client drivers which are not registered with the shutdown
> > callback assumes PCIe link is still accessible and initiates some
> > transfers or there may be on ongoing transfers also which can
> > result in some system errors like soc error etc which can hang the
> > device.
> >
> I'd consider a bus device driver behaving this way as broken. IMO
> device drivers should ensure that device is quiesced on shutdown.
> As you highlight this case, do you have specific examples? Maybe we
> should focus on fixing such bus device drivers first.
>
> I'd be interested in the PCI maintainers point of view, that's why I
> added Bjorn to the discussion.
See https://lore.kernel.org/r/20250423171715.GA430351@bhelgaas
I think we should resolve the question of whether .shutdown() should
remove all downstream drivers and devices needs to be resolved, and
hopefully we don't have to discuss it in two separate places.
Bjorn
Powered by blists - more mailing lists