linux-kernel - Re: [PATCH] PCI: mediatek: Change MSI interrupt processing sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220128131250.GA200007@bhelgaas>
Date:   Fri, 28 Jan 2022 07:12:50 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Marc Zyngier <maz@...nel.org>
Cc:     "qizhong.cheng" <qizhong.cheng@...iatek.com>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Jianjun Wang <jianjun.wang@...iatek.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Krzysztof Wilczyński <kw@...ux.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
        linux-mediatek@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, chuanjia.liu@...iatek.com,
        Srikanth Thokala <srikanth.thokala@...el.com>,
        Pratyush Anand <pratyush.anand@...il.com>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Pali Rohár <pali@...nel.org>
Subject: Re: [PATCH] PCI: mediatek: Change MSI interrupt processing sequence

On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote:
> On Thu, 27 Jan 2022 21:21:00 +0000,
> Bjorn Helgaas <helgaas@...nel.org> wrote:
> > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > > > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > > > > As an edge-triggered interrupts, its interrupt status should
> > > > > > be cleared before dispatch to the handler of device.
> > > > > 
> > > > > I'm not an IRQ expert, but the reasoning that "we should clear
> > > > > the MSI interrupt status before dispatching the handler because
> > > > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > > > convincing because your code will now look like this:
> > > > > 
> > > > >   /* Clear the INTx */
> > > > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
> > > > >   ...
> > > > > 
> > > > >   /* Clear MSI interrupt status */
> > > > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > > > 
> > > > > You clear interrupt status before dispatching the handler for
> > > > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > > > interrupts.
> > > > > 
> > > > > So it doesn't seem that simply being edge-triggered is the
> > > > > critical factor here.
> > > > 
> > > > This is the usual problem with these half-baked implementations.
> > > > The signalling to the primary interrupt controller is level, as
> > > > they take a multitude of input and (crucially) latch the MSI
> > > > edges. Effectively, this is an edge-to-level converter, with all
> > > > the problems that this creates.
> > > > 
> > > > By clearing the status *after* the handling, you lose edges that
> > > > have been received and coalesced after the read of the status
> > > > register. By clearing it *before*, you are acknowledging the
> > > > interrupts early, and allowing them to be coalesced independently
> > > > of the ones that have been received earlier.
> > > > 
> > > > This is however mostly an educated guess. Someone with access to
> > > > the TRM should verify this.
> > > 
> > > Yes, as Maz said, we save the edge-interrupt status so that it
> > > becomes a level-interrupt. This is similar to an edge-to-level
> > > converter, so we need to clear it *before*. We found this problem
> > > through a lot of experiments and tested this patch.
> > 
> > I thought there might be other host controllers with similar design,
> > so I looked at all the other drivers and tried to figure out whether
> > any others had similar problems.
> > 
> > The ones below look suspicious to me because they all clear some sort
> > of status register *after* handling an MSI.  Can you guys take a look
> > and make sure they are working correctly?
> > 
> >   keembay_pcie_msi_irq_handler
> >     status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >       writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> > 
> >   spear13xx_pcie_irq_handler
> >     status = readl(&app_reg->int_sts)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >     writel(status, &app_reg->int_clr)
> 
> I think these two are fine.
> 
> The top level interrupt is only a level signal that the is something
> to process. The only thing that is unclear is what the effect of
> writing to that status register if MSIs are pending at that point. A
> sane implementation would just ignore the write.
> 
> The actual processing is done in dw_handle_msi_irq(), reading the
> PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack
> the interrupt, one bit at a time, as interrupts are handled (see
> dw_pci_bottom_ack). Ack taking place before the handling, it makes it
> safe for edge delivery.
> 
> >   advk_pcie_handle_int
> >     isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
> >     if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
> >       advk_pcie_handle_msi
> >         advk_readl(pcie, PCIE_MSI_STATUS_REG)
> > 	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
> > 	generic_handle_irq
> > 	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)
> 
> Same thing, I guess. It is just that the Ack has been open-coded.
> 
> >   mtk_pcie_irq_handler
> >     status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
> >     for_each_set_bit_from(irq_bit, &status, ...)
> >       mtk_pcie_msi_handler
> >         generic_handle_domain_irq
> >       writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG)
> 
> Similar thing. The PCIE_MSI_SET_STATUS register is read first, and
> then written back in the ack callback.

Thanks a lot for taking a look at these, Marc!  Is there anything we
can do to make all these drivers/pci/controller/* drivers more
consistent and easier to review?  I found it very difficult to look
across all of them and find similar design patterns.

Bjorn