[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALeDE9P0bNWTDO+4kUt66QOQFbp548Jum_XkGKUQro7_G+YQdA@mail.gmail.com>
Date: Wed, 30 Jun 2021 21:46:01 +0100
From: Peter Robinson <pbrobinson@...il.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Javier Martinez Canillas <javierm@...hat.com>,
linux-kernel@...r.kernel.org, Shawn Lin <shawn.lin@...k-chips.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Rob Herring <robh@...nel.org>,
linux-arm-kernel@...ts.infradead.org, linux-pci@...r.kernel.org,
linux-rockchip@...ts.infradead.org,
Michal Simek <michal.simek@...inx.com>,
Jingoo Han <jingoohan1@...il.com>,
Thierry Reding <thierry.reding@...il.com>,
Jonathan Hunter <jonathanh@...dia.com>,
linux-tegra@...r.kernel.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with
clocks gated
On Wed, Jun 30, 2021 at 9:30 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
> > On 6/30/21 8:59 PM, Bjorn Helgaas wrote:
> > > [+cc Michal, Jingoo, Thierry, Jonathan]
> >
> > [snip]
> >
> > >
> > > I think the above commit log is perfectly accurate, but all the
> > > details might suggest that this is something specific to rockchip or
> > > CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
> > > fundamental problem, which is actually very simple: we registered IRQ
> > > handlers before we were ready for them to be called.
> > >
> > > I propose the following commit log in the hope that it would help
> > > other driver authors to make similar fixes:
> > >
> > > PCI: rockchip: Register IRQ handlers after device and data are ready
> > >
> > > An IRQ handler may be called at any time after it is registered, so
> > > anything it relies on must be ready before registration.
> > >
> > > rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
> > > read registers in the PCIe controller, but we registered them before
> > > turning on clocks to the controller. If either is called before the clocks
> > > are turned on, the register reads fail and the machine hangs.
> > >
> > > Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
> > > but we installed it before initializing irq_domain.
> > >
> > > Register IRQ handlers after their data structures are initialized and
> > > clocks are enabled.
> > >
> > > If this is inaccurate or omits something important, let me know. I
> > > can make any updates locally.
> > >
> >
> > I think your description is accurate and agree that the commit message may
> > be misleading. As you said, this is a general problem and the fact that an
> > IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make
> > the assumptions in the driver to fall apart.
> >
> > But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
> > option and shared interrupts? That way, other driver authors could know that
> > by enabling this an underlying problem might be exposed for them to fix.
>
> Good idea, thanks! I added this; is it something like what you had in
> mind?
>
> Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
> is being unregistered. An error during the probe path might cause this
> unregistration and IRQ handler execution before the device or data
> structure init has finished.
Would it make sense to enable CONFIG_DEBUG_SHIRQ in defconfig to
better pick up these problems?
Peter
Powered by blists - more mailing lists