linux-kernel - Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs prep_new

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140109194350.GA22436@redhat.com>
Date:	Thu, 9 Jan 2014 20:43:50 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>
Subject: Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs
	prep_new_page() race

On 01/09, Andrea Arcangeli wrote:
>
> On Thu, Jan 09, 2014 at 03:04:47PM +0100, Oleg Nesterov wrote:
> > OK. Even if I am right, we can probably make another fix.
>
> I think the confusion here was to think this was related to the futex
> code, it isn't. This was just a generic theoretical problem found
> doing the futex cleanups but it's not related to the futex code.

Yes, yes, sure. I mentioned get_futex_key() just for example.

> > put_compound_page() and __get_page_tail() can do yet another PageTail()
> > check _before_ compound_lock().
>
> The above alternate fix looks good to me too.
>
> Only thing to sort out is in the common code (not just x86) then we
> may need a smp_mb() between PageTail check and the bit_spin_lock... We
> just can't risk writing the bit_spin_lock before reading PageTail.

I do not think we need mb() in between... other callers of compound_lock()
look fine, get/put(page_tail) can't have the false positive after successful
get_page_unless_zero(), and recently it was documented that the kernel can
rely on the control dependency to serialize LOAD + STORE.

But we probably need barrier() in between, we can't use ACCESS_ONCE().

> And regardless of gup_fast, like Linus said, for increased NUMA
> fairness we could move the compound lock from page->flags to an hashed
> array of proper spinlocks sized in function of ram. The contention on
> these locks is so low that I doubt we can run into lock starvation,
> but because the contention is so low, the array would be fine as well,
> and it would be more theoretically correct for NUMA usages than the
> bit spinlock. So this problem also goes away if we convert the
> bit_spin_lock to an hashed array of spin_lock.

Yes. But in this case I really think we should cleanup get/put first
and add the helper, like the patch I mentioned does.

> I personally prefer to keep the complexity in one place so adding to
> get/put_page

OK. I'll send v3.

> > Although personally I'd prefer this patch. And if we change get/put
> > I think it would be  better to do this on top of
> >
> > 	"[PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head()"
> > 	http://marc.info/?l=linux-kernel&m=138739438800899
>
> Not against the cleanups of course, but about the order, it gets
> harder to backport it for distros if applied after the cleanups.

Oh, I don't think this highly theoreitical fix should be backported
but I agree, lets fix the bug first.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/