linux-kernel - Re: [RFC PATCH 1/9] irq_work: Fix racy check on work pending flag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFTL4hxzGV4QFOLnTHYMhvKVsF_CrG7pOyi9hdgPBYLKdMB5zA@mail.gmail.com>
Date:	Mon, 29 Oct 2012 20:18:22 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Paul Gortmaker <paul.gortmaker@...driver.com>
Subject: Re: [RFC PATCH 1/9] irq_work: Fix racy check on work pending flag

2012/10/29 Steven Rostedt <rostedt@...dmis.org>:
> I wonder if the bug is just a memory barrier missing here? But that also
> suggests that the other CPU used a memory barrier too (or cmpxchg()
> which implies one).
>
> But this change looks fine too.

I fear a memory barrier wouldn't be much helpful here.

Thing is I'm not sure exactly what strict semantics is expected there:
do we allow for concurrent claiming only locally or also across CPUs?
Given that we only disable preemption on work queueing, I assumed we
allow for concurrent claiming across CPUs. Also we have irq work users
that may claim on random CPUs already and thus this can result in
cross-CPUs concurrent claiming. Task wide (not per cpu) perf event are
such examples. May be drivers/acpi/apei/ghes.c and perhaps others.

Also if we allow for cross-CPUs concurrent claiming, we should set
IRQ_WORK_BUSY with cmpxchg() (or xchg() if it's atomic, because nobody
can modify it concurrently at this stage as it's claimed). Otherwise a
CPU can fail its claim whereas the other CPU that claimed it may
actually have executed the work already, or is in the middle of doing
so. The race window stops when we clear the IRQ_WORK_BUSY flag with
cmpxchg so it's very tight. But still it's there.

Also irq_work_sync() should poll with ACCESS_ONCE(). Same problem with
foggy semantics here: do we allow remote sync?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/