linux-kernel - Re: [PATCH v5 01/20] EDAC/synopsys: Fix ECC status data and IRQ disable race condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240415183616.GDZh1zoFsBzvAEduRo@fat_crate.local>
Date: Mon, 15 Apr 2024 20:36:16 +0200
From: Borislav Petkov <bp@...en8.de>
To: Serge Semin <fancer.lancer@...il.com>
Cc: Michal Simek <michal.simek@....com>,
	Alexander Stein <alexander.stein@...tq-group.com>,
	Tony Luck <tony.luck@...el.com>, James Morse <james.morse@....com>,
	Mauro Carvalho Chehab <mchehab@...nel.org>,
	Robert Richter <rric@...nel.org>, Dinh Nguyen <dinguyen@...nel.org>,
	Punnaiah Choudary Kalluri <punnaiah.choudary.kalluri@...inx.com>,
	Arnd Bergmann <arnd@...db.de>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-arm-kernel@...ts.infradead.org, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, Sherry Sun <sherry.sun@....com>,
	Borislav Petkov <bp@...e.de>
Subject: Re: [PATCH v5 01/20] EDAC/synopsys: Fix ECC status data and IRQ
 disable race condition

On Thu, Feb 22, 2024 at 09:12:46PM +0300, Serge Semin wrote:
> The race condition around the ECCCLR register access happens in the IRQ
> disable method called in the device remove() procedure and in the ECC IRQ
> handler:
> 1. Enable IRQ:
>    a. ECCCLR = EN_CE | EN_UE
> 2. Disable IRQ:
>    a. ECCCLR = 0
> 3. IRQ handler:
>    a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
>    b. ECCCLR = 0
>    c. ECCCLR = EN_CE | EN_UE
> So if the IRQ disabling procedure is called concurrently with the IRQ
> handler method the IRQ might be actually left enabled due to the
> statement 3c.
> 
> The root cause of the problem is that ECCCLR register (which since v3.10a
> has been called as ECCCTL) has intermixed ECC status data clear flags and
> the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and
> handling (write 1 to clear ECC status data) procedures must be serialised
> around the ECCCTL register modification to prevent the race.
> 
> So fix the problem described above by adding the spin-lock around the
> ECCCLR modifications and preventing the IRQ-handler from modifying the
> IRQs enable flags (there is no point in disabling the IRQ and then
> re-enabling it again within a single IRQ handler call, see the statements
> 3a/3b and 3c above).

So I'm looking at the code and am looking at this and wondering how we
even ended up in this mess?!

An interrupt handler should not *enable* the interrupt again - that's
just crazy. And I should've seen that in

  4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw")

and stopped it right there. But well, it is what it is...

So I'd like to see the following flow:

* on init, the interrupt is enabled with enable_intr() *after*
registering the interrupt handler.

* on exit, the interrupt is disabled with disable_intr() and then no
interrupts are coming in anymore.

And then I don't think you'll need the spinlock and it'll be sane
design.

Right?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette