lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 27 Dec 2022 10:40:22 +0800
From:   Chih-En Lin <shiyn.lin@...il.com>
To:     Barry Song <baohua@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        John Hubbard <jhubbard@...dia.com>,
        Nadav Amit <namit@...are.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Yang Shi <shy828301@...il.com>, Peter Xu <peterx@...hat.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Alex Sierra <alex.sierra@....com>,
        Xianting Tian <xianting.tian@...ux.alibaba.com>,
        Colin Cross <ccross@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Brian Geffon <bgeffon@...gle.com>, Yu Zhao <yuzhao@...gle.com>,
        Tong Tiangen <tongtiangen@...wei.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Li kunyu <kunyu@...china.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Gautam Menghani <gautammenghani201@...il.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Mark Brown <broonie@...nel.org>, Will Deacon <will@...nel.org>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Barret Rhoden <brho@...gle.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        "Jason A . Donenfeld" <Jason@...c4.com>,
        Dinglan Peng <peng301@...due.edu>,
        Pedro Fonseca <pfonseca@...due.edu>,
        Jim Huang <jserv@...s.ncku.edu.tw>,
        Huichun Feng <foxhoundsk.tw@...il.com>
Subject: Re: [PATCH v3 04/14] mm/rmap: Break COW PTE in rmap walking

On Tue, Dec 27, 2022 at 02:15:00PM +1300, Barry Song wrote:
> On Mon, Dec 26, 2022 at 11:56 PM Chih-En Lin <shiyn.lin@...il.com> wrote:
> >
> > On Mon, Dec 26, 2022 at 10:40:49PM +1300, Barry Song wrote:
> > > On Tue, Dec 20, 2022 at 8:25 PM Chih-En Lin <shiyn.lin@...il.com> wrote:
> > > >
> > > > Some of the features (unmap, migrate, device exclusive, mkclean, etc)
> > > > might modify the pte entry via rmap. Add a new page vma mapped walk
> > > > flag, PVMW_BREAK_COW_PTE, to indicate the rmap walking to break COW PTE.
> > > >
> > > > Signed-off-by: Chih-En Lin <shiyn.lin@...il.com>
> > > > ---
> > > >  include/linux/rmap.h |  2 ++
> > > >  mm/migrate.c         |  3 ++-
> > > >  mm/page_vma_mapped.c |  2 ++
> > > >  mm/rmap.c            | 12 +++++++-----
> > > >  mm/vmscan.c          |  7 ++++++-
> > > >  5 files changed, 19 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > > index bd3504d11b155..d0f07e5519736 100644
> > > > --- a/include/linux/rmap.h
> > > > +++ b/include/linux/rmap.h
> > > > @@ -368,6 +368,8 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
> > > >  #define PVMW_SYNC              (1 << 0)
> > > >  /* Look for migration entries rather than present PTEs */
> > > >  #define PVMW_MIGRATION         (1 << 1)
> > > > +/* Break COW-ed PTE during walking */
> > > > +#define PVMW_BREAK_COW_PTE     (1 << 2)
> > > >
> > > >  struct page_vma_mapped_walk {
> > > >         unsigned long pfn;
> > > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > > index dff333593a8ae..a4be7e04c9b09 100644
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -174,7 +174,8 @@ void putback_movable_pages(struct list_head *l)
> > > >  static bool remove_migration_pte(struct folio *folio,
> > > >                 struct vm_area_struct *vma, unsigned long addr, void *old)
> > > >  {
> > > > -       DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION);
> > > > +       DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr,
> > > > +                             PVMW_SYNC | PVMW_MIGRATION | PVMW_BREAK_COW_PTE);
> > > >
> > > >         while (page_vma_mapped_walk(&pvmw)) {
> > > >                 rmap_t rmap_flags = RMAP_NONE;
> > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > > > index 93e13fc17d3cb..5dfc9236dc505 100644
> > > > --- a/mm/page_vma_mapped.c
> > > > +++ b/mm/page_vma_mapped.c
> > > > @@ -251,6 +251,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > > >                         step_forward(pvmw, PMD_SIZE);
> > > >                         continue;
> > > >                 }
> > > > +               if (pvmw->flags & PVMW_BREAK_COW_PTE)
> > > > +                       break_cow_pte(vma, pvmw->pmd, pvmw->address);
> > > >                 if (!map_pte(pvmw))
> > > >                         goto next_pte;
> > > >  this_pte:
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 2ec925e5fa6a9..b1b7dcbd498be 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -807,7 +807,8 @@ static bool folio_referenced_one(struct folio *folio,
> > > >                 struct vm_area_struct *vma, unsigned long address, void *arg)
> > > >  {
> > > >         struct folio_referenced_arg *pra = arg;
> > > > -       DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
> > > > +       /* it will clear the entry, so we should break COW PTE. */
> > > > +       DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE);
> > >
> > > what do you mean by breaking cow pte? in memory reclamation case, we are only
> > > checking and clearing page referenced bit in pte, do we really need to
> > > break cow?
> >
> > Since we might clear page referenced bit, it will modify the write
> > protection shared page table (COW-ed PTE). We should duplicate it.
> >
> > Actually, I didn’t break COW at first because it will conditionally
> > modify the table and only clear the referenced bit.
> > So, if clearing page referenced bit is fine to the COW-ed PTE table
> > and the break COW PTE is unnecessary here, we can remove it.
> 
> if a page is mapped by 100 processes and anyone of these 100 processes
> access this page, we will get a reference bit in the PTE. Otherwise, we will
> have to scan 100 PTEs to figure out if a page is accessed and should be
> kept in LRU.
> i don't see the fundamental necessity to duplicate PTE only because of clearing
> the reference bit. as keeping the pte shared will help save a lot of cost for
> memory reclamation for those CPUs which have hardware reference bits
> in PTE.
> 

As I knew, if the page reference bit is unset and the accessed bit of
the pte entry is set, the reclamation will clear the accessed bit and
set the reference bit of the page.

So, for the such case with COW PTE, the logic is same as the normal PTE
one. It makes sense. Thanks for helping me to clear up the point.

Thanks,
Chih-En Lin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ