[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140108161338.GA10434@redhat.com>
Date: Wed, 8 Jan 2014 17:13:38 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Mel Gorman <mgorman@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Jones <davej@...hat.com>,
Darren Hart <dvhart@...ux.intel.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>
Subject: Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs
prep_new_page() race
On 01/08, Mel Gorman wrote:
>
> On Sat, Jan 04, 2014 at 05:43:47PM +0100, Oleg Nesterov wrote:
> >
> > get/put_page(thp_tail) paths do get_page_unless_zero(page_head) +
> > compound_lock(). In theory this page_head can be already freed and
> > reallocated as alloc_pages(__GFP_COMP, smaller_order). In this case
> > get_page_unless_zero() can succeed right after set_page_refcounted(),
> > and compound_lock() can race with the non-atomic __SetPageHead() in
> > prep_compound_page().
> >
> This patch is putting a write barrier in the page allocator fast path and
> that is going to be a leading cause of Sad Face. We already have seen
> large regressions before when write barriers were introduced to the page
> allocator paths for cpusets. Sticking it under CONFIG_TRANSPARENT_HUGEPAGE
> does not really address the issue.
As you already mentioned in another email, smp_wmb() is mostly nop. On
x86_64 at least. Although perhaps it would be nice to have
static inline void atomic_store_release(atomic_t *v, int i)
{
smp_store_release(&v->counter, i);
}
> > Yes, but thp can access this page_head via stale pointer, tail->first_page,
> > if it races with split_huge_page_refcount().
>
> To justify the introduction of a performance regression we need to be 100%
> sure this race actually exists
See below. But let me remind that I never looked at this code before,
I can be easily wrong.
> and not just theoretical.
It is theoretical anyway, I guess.
> For futex, the THP page (and the tail) must have been discovered via
> the page tables in which case the page tables are temporarily preventing
> the page being freed to the allocator.
Yes. But, for example, get_futex_key() does
if (unlikely(PageTail(page))) {
put_page(page);
why this put_page() can't race with _split? If nothing else, another thread
can unmap the part of this vma.
> > For example, __get_page_tail() roughly does:
> >
> > // PageTail(page) was already checked
> >
> > page_head = page->first_page;
> >
> > /* WINDOW */
> >
> > get_page_unless_zero(page_head);
> >
> > compound_lock(page_head);
> >
> > recheck PageTail(page) to ensure page_head is still valid
> >
> > However, in the WINDOW above, split_huge_page() can split this huge page.
> > After that its head can be freed and reallocated. Of course, I don't think
> > it is possible to hit this race in practice, but still this looks wrong.
> >
>
> I can't think of a reason why we would actually hit that race in practice
Agreed, the window is tiny, unlikely this possible.
> I do not think we
> should stick a write barrier into the page allocator fast path.
OK, I won't argue, I leave this to you and Andrea.
But I still think this code needs other cleanups/simplifications. In
particular get_futex_key()->__get_user_pages_fast() should die imho.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists