lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxq3-2qb76-xaR-sid9nxLYK7rcnm08w2NQzUcgPFKz2A@mail.gmail.com>
Date:	Mon, 17 Nov 2014 15:59:03 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Jens Axboe <axboe@...nel.dk>, Ingo Molnar <mingo@...nel.org>,
	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4

On Mon, Nov 17, 2014 at 2:43 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
>>
>> No, that won't work for synchronous calls:\

Right you are.

> So a combo of both (Jens and yours) might do the trick. Patch below.

Yeah, I guess that would work. The important part is that *if*
somebody really reuses the csd, we'd better have a release barrier
(which csd_unlock() does, although badly - but this probably isn't
that performance-critical) *before* we call the function, because
otherwise there's no real serialization for the reuse.

Of course, most of these things are presumably always per-cpu data
structures, so the whole worry about "csd" being accessed from
different CPU's probably doesn't even exist, and this all works fine
as-is anyway, even in the presense of odd memory ordering issues.

Judging from Jens' later email, it looks like we simply don't need
this code at all any more, though, and we could just revert the
commit.

NOTE! I don't think this actually has anything to do with the actual
problem that Dave saw. I just reacted to that WARN_ON() when I was
looking at the code, and it made me go "that looks extremely
suspicious".

Particularly on x86, with strong memory ordering, I don't think that
any random accesses to 'csd' after the call to 'csd->func()' could
actually matter. I just felt very nervous about the claim that
somebody can reuse the csd immediately, that smelled bad to me from a
*conceptual* standpoint, even if I suspect it works perfectly fine in
practice.

Anyway, I've found *another* race condition, which (again) doesn't
actually seem to be an issue on x86.

In particular, "csd_lock()" does things pretty well, in that it does a
smp_mb() after setting the lock bit, so certainly nothing afterwards
will leak out of that locked region.

But look at csd_lock_wait(). It just does

        while (csd->flags & CSD_FLAG_LOCK)
                cpu_relax();

and basically there's no memory barriers there. Now, on x86, this is a
non-issue, since all reads act as an acquire, but at least in *theory*
we have this completely unordered read going on. So any subsequent
memory oeprations (ie after the return from generic_exec_single()
could in theory see data from *before* the read.

So that whole kernel/smp.c locking looks rather dubious. The smp_mb()
in csd_lock() is overkill (a "smp_store_release()" should be
sufficient), and I think that the read of csd->flags in csd_unlock()
should be a smp_load_acquire().

Again, none of this has anything to do with Dave's problem. The memory
ordering issues really cannot be an issue on x86, I'm just saying that
there's code there that makes me a bit uncomfortable.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ