[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190926102036.od2wamdx2s7uznvq@box>
Date: Thu, 26 Sep 2019 13:20:36 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Yu Zhao <yuzhao@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>,
Jérôme Glisse <jglisse@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
David Rientjes <rientjes@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Lance Roy <ldr709@...il.com>,
Ralph Campbell <rcampbell@...dia.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Dave Airlie <airlied@...hat.com>,
Thomas Hellstrom <thellstrom@...are.com>,
Souptick Joarder <jrdr.linux@...il.com>,
Mel Gorman <mgorman@...e.de>, Jan Kara <jack@...e.cz>,
Mike Kravetz <mike.kravetz@...cle.com>,
Huang Ying <ying.huang@...el.com>,
Aaron Lu <ziqian.lzq@...fin.com>,
Omar Sandoval <osandov@...com>,
Thomas Gleixner <tglx@...utronix.de>,
Vineeth Remanan Pillai <vpillai@...italocean.com>,
Daniel Jordan <daniel.m.jordan@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Joel Fernandes <joel@...lfernandes.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
Pavel Tatashin <pavel.tatashin@...rosoft.com>,
David Hildenbrand <david@...hat.com>,
Juergen Gross <jgross@...e.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>,
Johannes Weiner <hannes@...xchg.org>,
"Darrick J . Wong" <darrick.wong@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup
prematurely
On Wed, Sep 25, 2019 at 04:26:54PM -0600, Yu Zhao wrote:
> On Wed, Sep 25, 2019 at 10:25:30AM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 24, 2019 at 05:24:58PM -0600, Yu Zhao wrote:
> > > We don't want to expose a non-hugetlb page to the fast gup running
> > > on a remote CPU before all local non-atomic ops on the page flags
> > > are visible first.
> > >
> > > For an anon page that isn't in swap cache, we need to make sure all
> > > prior non-atomic ops, especially __SetPageSwapBacked() in
> > > page_add_new_anon_rmap(), are ordered before set_pte_at() to prevent
> > > the following race:
> > >
> > > CPU 1 CPU1
> > > set_pte_at() get_user_pages_fast()
> > > page_add_new_anon_rmap() gup_pte_range()
> > > __SetPageSwapBacked() SetPageReferenced()
> > >
> > > This demonstrates a non-fatal scenario. Though haven't been directly
> > > observed, the fatal ones can exist, e.g., PG_lock set by fast gup
> > > caller and then overwritten by __SetPageSwapBacked().
> > >
> > > For an anon page that is already in swap cache or a file page, we
> > > don't need smp_wmb() before set_pte_at() because adding to swap or
> > > file cach serves as a valid write barrier. Using non-atomic ops
> > > thereafter is a bug, obviously.
> > >
> > > smp_wmb() is added following 11 of total 12 page_add_new_anon_rmap()
> > > call sites, with the only exception being
> > > do_huge_pmd_wp_page_fallback() because of an existing smp_wmb().
> > >
> >
> > I'm thinking this patch make stuff rather fragile.. Should we instead
> > stick the barrier in set_p*d_at() instead? Or rather, make that store a
> > store-release?
>
> I prefer it this way too, but I suspected the majority would be
> concerned with the performance implications, especially those
> looping set_pte_at()s in mm/huge_memory.c.
We can rename current set_pte_at() to __set_pte_at() or something and
leave it in places where barrier is not needed. The new set_pte_at()( will
be used in the rest of the places with the barrier inside.
BTW, have you looked at other levels of page table hierarchy. Do we have
the same issue for PMD/PUD/... pages?
--
Kirill A. Shutemov
Powered by blists - more mailing lists