linux-kernel - Re: [PATCH v5 01/20] EDAC/synopsys: Fix ECC status data and IRQ disable race condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <vugkhnu5c7so7dk3z2cuhlbu66gv6skvicuseblrmkzyttnnlr@lqzqvysk6wbl>
Date: Mon, 6 May 2024 14:27:50 +0300
From: Serge Semin <fancer.lancer@...il.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Michal Simek <michal.simek@....com>, 
	Alexander Stein <alexander.stein@...tq-group.com>, Tony Luck <tony.luck@...el.com>, 
	James Morse <james.morse@....com>, Mauro Carvalho Chehab <mchehab@...nel.org>, 
	Robert Richter <rric@...nel.org>, Dinh Nguyen <dinguyen@...nel.org>, 
	Punnaiah Choudary Kalluri <punnaiah.choudary.kalluri@...inx.com>, Arnd Bergmann <arnd@...db.de>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-arm-kernel@...ts.infradead.org, linux-edac@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Sherry Sun <sherry.sun@....com>, Borislav Petkov <bp@...e.de>
Subject: Re: [PATCH v5 01/20] EDAC/synopsys: Fix ECC status data and IRQ
 disable race condition

On Mon, May 06, 2024 at 12:20:29PM +0200, Borislav Petkov wrote:
> On Thu, Apr 25, 2024 at 03:52:38PM +0300, Serge Semin wrote:
> > Even if we get to add the spin-lock serializing the ECCCLR writes it
> > won't solve the problem since the IRQ-disabler critical section could
> > be executed a bit before the IRQ-handler critical section so the later
> > one will just re-enable the IRQs disabled by the former one.
> > 
> > Here is what is suggested in my patch to fix the problem:
> > 
> >      IRQ-handler                        |    IRQ-disabler
> >                                         |
> > zynqmp_get_error_info:                  |
> >                                         | lock_irqsave
> >                                         | ECCCLR = 0; // disable IRQs
> >                                         | unlock_irqrestore
> >  lock_irqsave;                          |
> >  tmp = ECCCLR | clear_sts_bits;         |
> >  ECCCLR = tmp;                          |
> >  unlock_irqrestore;                     |
> 

> <--- I'm presuming here the IRQ-disabler will reenable interrupts at
> some point?
> 
> Otherwise we have the same problem as before when interrupts remain off
> after the IRQ handler has run.

In the sketch above the IRQ-disabler is the method which disables the
IRQ in the concurrent manner. After my patch is applied the
IRQ-handler will no longer touch the IRQ enable/disable bits, but will
preserve them as is:
-       clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT;
-       clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
+       spin_lock_irqsave(&priv->reglock, flags);
+
+       clearval = readl(base + ECC_CLR_OFST) |
+                  ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT |
+                  ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
        writel(clearval, base + ECC_CLR_OFST);
-       writel(0x0, base + ECC_CLR_OFST);
+
+       spin_unlock_irqrestore(&priv->reglock, flags);

Thus there won't be need in the IRQs re-enabling later in the handler:

@@ -576,8 +601,6 @@ static irqreturn_t intr_handler(int irq, void *dev_id)
        /* v3.0 of the controller does not have this register */
        if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR))
                writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST);
-       else
-               enable_intr(priv);

So the only IRQ-disabler left in the driver - disable_intr() - will be
called from the device/driver remove() function. The ECCCLR CSR access
will be guarded with the spin-lock in the IRQ-disabler and in the
IRQ-handler. So it will be safe to have them executed concurrently.

> 
> Other than that, yes, I see it, we will need the locking.
> 
> Thanks for elaborating.

Always welcome. Glad we've settled this.)

-Serge(y)

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette