linux-kernel - Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFTL4hz7Jy69rr4Ze1pOtYPo5CFSZuTGXdY+w3jR4ZboA_8EYg@mail.gmail.com>
Date:	Wed, 31 Oct 2012 01:36:37 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	anish kumar <anish198519851985@...il.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Paul Gortmaker <paul.gortmaker@...driver.com>
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012/10/30 anish kumar <anish198519851985@...il.com>:
> As I understand without the memory barrier proposed by you the situation
> would be as below:
> CPU 0                                 CPU 1
>
> data = something                     flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg      execute_work (sees data from CPU 0)
>            on flags in claim)
> _success_ in claiming and goes
> ahead and execute the work(wrong?)
>                                      cmpxchg cause flag to IRQ_WORK_BUSY
>
> Now knows the flag==IRQ_WORK_BUSY
>
> Am I right?

(Adding Paul in Cc because I'm again confused with memory barriers)

Actually what I had in mind is rather that CPU 0 fails its claim
because it's not seeing the IRQ_WORK_BUSY flag as it should:


CPU 0                                 CPU 1

data = something                  flags = IRQ_WORK_BUSY
cmpxchg() for claim               execute_work (sees data from CPU 0)

CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
value in a non-atomic way.

Also, while browsing Paul's perfbook, I realize my changelog is buggy.
It seems we can't reliably use memory barriers here because we would
be in the following case:

CPU 0                                  CPU 1

store(work data)                    store(flags)
smp_mb()                            smp_mb()
load(flags)                            load(work data)

On top of this barrier pairing, we can't make the assumption that, for
example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
the flags stored in CPU 1.

So now I wonder if cmpxchg() can give us more confidence:


CPU 0                                                CPU 1

store(work data)                                  xchg(flags, IRQ_WORK_BUSY)
cmpxchg(flags, IRQ_WORK_FLAGS)    load(work data)

Can I make this assumption?

- If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
then CPU 1 will execute the work and see our data.

At least cmpxchg / xchg pair orders correctly to ensure somebody will
execute our work. Now probably some locking is needed from the work
function itself if it's not per cpu.

>
> Probably a stupid question.Why do we return the bool from irq_work_queue
> when no one bothers to check the return value?Wouldn't it be better if
> this function is void as used by the users of this function or am I
> looking at the wrong code.

No idea. Probably Peter had plans there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/