[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1508122038380.4539@eggly.anvils>
Date: Wed, 12 Aug 2015 21:12:07 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Greg Thelen <gthelen@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Dave Hansen <dave.hansen@...el.com>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Christoph Lameter <cl@...two.org>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Steve Capper <steve.capper@...aro.org>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...e.cz>,
Jerome Marchand <jmarchan@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: page-flags behavior on compound pages: a worry
On Thu, 13 Aug 2015, Kirill A. Shutemov wrote:
>
> All this situation is ugly. I'm thinking on more general solution for
> PageTail() vs. ->first_page race.
>
> We would be able to avoid the race in first place if we encode PageTail()
> and position of head page within the same word in struct page. This way we
> update both thing in one shot without possibility of race.
>
> Details get tricky.
>
> I'm going to try tomorrow something like this: encode the position of head
> as offset from the tail page and store it as negative number in the union
> with ->mapping and ->s_mem. PageTail() can be implemented as check value
> of the field to be in range -1..-MAX_ORDER_NR_PAGES.
>
> I'm not sure at all if it's going to work, especially looking on
> ridiculously high CONFIG_FORCE_MAX_ZONEORDER some architectures allow.
>
> We could also try to encode page order instead (again as negative number)
> and calculate head page position based on alignment...
>
> Any other ideas are welcome.
Good luck, I've not given it any thought, but hope it works out:
my reasoning was the same when I put the PageAnon bit into
page->mapping instead of page->flags.
Something to beware of though: although exceedingly unlikely to be a
problem, page->mapping always contained a pointer to or into a relevant
structure, or else something that could not possibly be a kernel pointer,
when I was working on KSM swapping: see comment above get_ksm_page() in
mm/ksm.c. It is best to keep page->mapping for pointers if possible
(and probably avoid having the PageAnon bit set unless really Anon).
I've only just read your mail, and I'm too slow a thinker to have
worked through your isolate_migratepages_block() race yet. But, given
the timing, cannot resist sending you a code fragment I wrote earlier
today for our v3.11-based kernel: which still has compound_trans_order(),
which we had been using in a similar racy physical scan.
I'm not for a moment suggesting that this fragment is relevant to your
race; but it is something amusing to consider when you're thinking of
such races. Credit to Greg Thelen for thinking of the prep_compound_page()
end of it, when I'd been focussed on the __split_huge_page_refcount() end.
/*
* It is not safe to use compound_lock (inside compound_trans_order)
* until we have a reference on the page (okay, done above) and have
* then seen PageLRU on it (just below): because mm/huge_memory.c uses
* the non-atomic __SetPageUptodate on a freshly allocated THPage in
* several places, believing it to be invisible to the outside world,
* but liable to race and leave PG_compound_lock set when cleared here.
*/
nr_pages = 1;
if (PageHead(page)) {
/*
* smp_rmb() against the smp_wmb() in the first iteration of
* prep_compound_page(), so that the PageTail test ensures
* that compound_order(page) is now correctly readable.
*/
smp_rmb();
if (PageTail(page + 1)) {
nr_pages = 1 << compound_order(page);
/*
* Then smp_rmb() against smp_wmb() in last iteration of
* __split_huge_page_refcount(), to ensure that has not
* yet written something else into page[1].lru.prev.
*/
smp_rmb();
if (!PageTail(page + 1))
nr_pages = 1;
}
}
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists