lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyKhgwYUMGs8MBLY5WDXhHwXP3dLWs40ayCYfgiwwncsQ@mail.gmail.com>
Date:	Fri, 29 Mar 2013 13:41:53 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Jones <davej@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...riel.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Davidlohr Bueso <davidlohr.bueso@...com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	hhuang@...hat.com, "Low, Jason" <jason.low2@...com>,
	Michel Lespinasse <walken@...gle.com>,
	Larry Woodman <lwoodman@...hat.com>,
	"Vinod, Chegu" <chegu_vinod@...com>,
	Peter Hurley <peter@...leysoftware.com>
Subject: Re: ipc,sem: sysv semaphore scalability

On Fri, Mar 29, 2013 at 9:17 AM, Dave Jones <davej@...hat.com> wrote:
>
> Now that I have that reverted, I'm not seeing msgrcv traces any more, but
> I've started seeing this..
>
> general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> RIP: 0010:[<ffffffff812c20fb>]  [<ffffffff812c20fb>] free_msg+0x2b/0x40
> Call Trace:
>  [<ffffffff812c289f>] freeque+0xcf/0x140
>  [<ffffffff812c2a93>] msgctl_down.constprop.9+0x183/0x200
>  [<ffffffff812c2da9>] sys_msgctl+0x139/0x400
>  [<ffffffff816cd942>] system_call_fastpath+0x16/0x1b
>
> Looks like seg was already kfree'd.

Hmm.

I have a suspicion.

The use of ipc_rcu_getref/ipc_rcu_putref() seems a bit iffy.

In particular, the refcount is not an atomic variable, so we
absolutely *depend* on the spinlock for it.

However, looking at "freeque()", that's not actually the case. It
releases the message queue spinlock *before* it does the
ipc_rcu_putref(), and it does that because the thing has become
unreachable (it did a msg_rmid(), which will set ->deleted, which in
turn will mean that nobody should successfully look it up any more).

HOWEVER.

While the "deleted" flag is serialized, the actual refcount is not. So
in *parallel* with the freeque() call, we may have some other user
that does something like

                ...
                ipc_rcu_getref(msq);
                msg_unlock(msq);
                schedule();

                ipc_lock_by_ptr(&msq->q_perm);
                ipc_rcu_putref(msq);
                if (msq->q_perm.deleted) {
                        err = -EIDRM;
                        goto out_unlock_free;
                }
                ...


which got the lock for the "deleted" test, so that part is all fine,
but notice the "ipc_rcu_putref()". It can happen at the same time that
freeque() also does its own ipc_rcu_putref().

So now refcount may get buggered, resulting in random early reuse,
double free's or leaking of the msq.

There may be some reason I don't see why this cannot happen, but it
does look suspicious. I wonder if the refcount should be an atomic
value.

The alternative would be to make sure the thing is always locked (and
in a rcu-read-safe region) before putref/getref. The only place (apart
from the initial allocation, which doesn't matter, because nothing can
see if itf that path fails) seems to be that freeque(), but I didn't
check everything.

Moving the

    msg_unlock(msq);

to the end of freeque() might be the way to go. It's going to cause us
to hold the lock for longer, but I'm not sure we care for that path.

Guys, am I missing something? This kind of refcounting problem might
well explain the rcu-freeing-time bugs reported with the scalability
patches: long-time race that just got *much* easier to expose with the
higher level of parallelism?

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ