lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Dec 2013 08:00:38 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Simon Kirby <sim@...tway.ca>
Cc:	Ian Applegate <ia@...udflare.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Christoph Lameter <cl@...two.org>,
	Pekka Enberg <penberg@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Chris Mason <chris.mason@...ionio.com>
Subject: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())

On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I still don't see what could be wrong with the pipe_inode_info thing,
> but the fact that it's been so consistent in your traces does make me
> suspect it really is *that* particular slab.

I think I finally found it.

I've spent waaayy too much time looking at and thinking about that
code without seeing anything wrong, but this morning I woke up and
thought to myself "What if.."

And looking at the code again, I went "BINGO".

All our reference counting etc seems right, but we have one very
subtle bug: on the freeing path, we have a pattern like this:

        spin_lock(&inode->i_lock);
        if (!--pipe->files) {
                inode->i_pipe = NULL;
                kill = 1;
        }
        spin_unlock(&inode->i_lock);
        __pipe_unlock(pipe);
        if (kill)
                free_pipe_info(pipe);

which on the face of it is trying to be very careful in not accessing
the pipe-info after it is released by having that "kill" flag, and
doing the release last.

And it's complete garbage.

Why?

Because the thread that decrements "pipe->files" *without* releasing
it, will very much access it after it has been dropped: that
"__pipe_unlock(pipe)" happens *after* we've decremented the pipe
reference count and dropped the inode lock. So another CPU can come in
and free the structure concurrently with that __pipe_unlock(pipe).

This happens in two places, and we don't actually need or want the
pipe lock for the pipe->files accesses (since pipe->files is protected
by inode->i_lock, not the pipe lock), so the solution is to just do
the __pipe_unlock() before the whole dance about the pipe->files
reference count.

Patch appended. And no wonder nobody has ever seen it, because the
race is unlikely as hell to ever happen. Simon, I assume it will be
another few months before we can say "yeah, that fixed it", but I
really think this is it. It explains all the symptoms, including
"DEBUG_PAGEALLOC didn't catch it" (because the access happens just as
it is released, and DEBUG_PAGEALLOC takes too long to actually free
unmap the page etc).

                     Linus

View attachment "patch.diff" of type "text/plain" (859 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ