[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140109194350.GA22436@redhat.com>
Date: Thu, 9 Jan 2014 20:43:50 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Jones <davej@...hat.com>,
Darren Hart <dvhart@...ux.intel.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>
Subject: Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs
prep_new_page() race
On 01/09, Andrea Arcangeli wrote:
>
> On Thu, Jan 09, 2014 at 03:04:47PM +0100, Oleg Nesterov wrote:
> > OK. Even if I am right, we can probably make another fix.
>
> I think the confusion here was to think this was related to the futex
> code, it isn't. This was just a generic theoretical problem found
> doing the futex cleanups but it's not related to the futex code.
Yes, yes, sure. I mentioned get_futex_key() just for example.
> > put_compound_page() and __get_page_tail() can do yet another PageTail()
> > check _before_ compound_lock().
>
> The above alternate fix looks good to me too.
>
> Only thing to sort out is in the common code (not just x86) then we
> may need a smp_mb() between PageTail check and the bit_spin_lock... We
> just can't risk writing the bit_spin_lock before reading PageTail.
I do not think we need mb() in between... other callers of compound_lock()
look fine, get/put(page_tail) can't have the false positive after successful
get_page_unless_zero(), and recently it was documented that the kernel can
rely on the control dependency to serialize LOAD + STORE.
But we probably need barrier() in between, we can't use ACCESS_ONCE().
> And regardless of gup_fast, like Linus said, for increased NUMA
> fairness we could move the compound lock from page->flags to an hashed
> array of proper spinlocks sized in function of ram. The contention on
> these locks is so low that I doubt we can run into lock starvation,
> but because the contention is so low, the array would be fine as well,
> and it would be more theoretically correct for NUMA usages than the
> bit spinlock. So this problem also goes away if we convert the
> bit_spin_lock to an hashed array of spin_lock.
Yes. But in this case I really think we should cleanup get/put first
and add the helper, like the patch I mentioned does.
> I personally prefer to keep the complexity in one place so adding to
> get/put_page
OK. I'll send v3.
> > Although personally I'd prefer this patch. And if we change get/put
> > I think it would be better to do this on top of
> >
> > "[PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head()"
> > http://marc.info/?l=linux-kernel&m=138739438800899
>
> Not against the cleanups of course, but about the order, it gets
> harder to backport it for distros if applied after the cleanups.
Oh, I don't think this highly theoreitical fix should be backported
but I agree, lets fix the bug first.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists