linux-kernel - Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGsJ_4wanQr8mniGnJrzDHKGzWt8J2vT_Mx65+46=qk-R0iM3w@mail.gmail.com>
Date: Mon, 26 Feb 2024 18:05:43 +1300
From: Barry Song <21cnbao@...il.com>
To: Chris Li <chrisl@...nel.org>
Cc: David Hildenbrand <david@...hat.com>, ryan.roberts@....com, akpm@...ux-foundation.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, mhocko@...e.com, 
	shy828301@...il.com, wangkefeng.wang@...wei.com, willy@...radead.org, 
	xiang@...nel.org, ying.huang@...el.com, yuzhao@...gle.com, surenb@...gle.com, 
	steven.price@....com, Barry Song <v-songbaohua@...o.com>, 
	Chuanhua Han <hanchuanhua@...o.com>
Subject: Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()

On Tue, Jan 30, 2024 at 5:32 AM Chris Li <chrisl@...nel.org> wrote:
>
> On Mon, Jan 29, 2024 at 2:07 AM David Hildenbrand <david@...hat.com> wrote:
> >
> > On 29.01.24 04:25, Chris Li wrote:
> > > Hi David and Barry,
> > >
> > > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@...il.com> wrote:
> > >>
> > >>>
> > >>>
> > >>> I have on my todo list to move all that !anon handling out of
> > >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> > >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> > >>> then (-> whole new folio exclusive).
> > >>>
> > >>> That's the cleaner approach.
> > >>>
> > >>
> > >> one tricky thing is that sometimes it is hard to know who is the first
> > >> one to add rmap and thus should
> > >> call folio_add_new_anon_rmap.
> > >> especially when we want to support swapin_readahead(), the one who
> > >> allocated large filio might not
> > >> be that one who firstly does rmap.
> > >
> > > I think Barry has a point. Two tasks might race to swap in the folio
> > > then race to perform the rmap.
> > > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > > "new", not shared. The sharing in swap cache disqualifies that
> > > condition.
> >
> > We have to hold the folio lock. So only one task at a time might do the
> > folio_add_anon_rmap_ptes() right now, and the
> > folio_add_new_shared_anon_rmap() in the future [below].
> >
>
> Ah, I see. The folio_lock() is the answer I am looking for.
>
> > Also observe how folio_add_anon_rmap_ptes() states that one must hold
> > the page lock, because otherwise this would all be completely racy.
> >
> >  From the pte swp exclusive flags, we know for sure whether we are
> > dealing with exclusive vs. shared. I think patch #6 does not properly
> > check that all entries are actually the same in that regard (all
> > exclusive vs all shared). That likely needs fixing.
> >
> > [I have converting per-page PageAnonExclusive flags to a single
> > per-folio flag on my todo list. I suspect that we'll keep the
> > per-swp-pte exlusive bits, but the question is rather what we can
> > actually make work, because swap and migration just make it much more
> > complicated. Anyhow, future work]
> >
> > >
> > >> is it an acceptable way to do the below in do_swap_page?
> > >> if (!folio_test_anon(folio))
> > >>        folio_add_new_anon_rmap()
> > >> else
> > >>        folio_add_anon_rmap_ptes()
> > >
> > > I am curious to know the answer as well.
> >
> >
> > Yes, the end code should likely be something like:
> >
> > /* ksm created a completely new copy */
> > if (unlikely(folio != swapcache && swapcache)) {
> >         folio_add_new_anon_rmap(folio, vma, vmf->address);
> >         folio_add_lru_vma(folio, vma);
> > } else if (folio_test_anon(folio)) {
> >         folio_add_anon_rmap_ptes(rmap_flags)
> > } else {
> >         folio_add_new_anon_rmap(rmap_flags)
> > }
> >
> > Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> > callers about a new flag, and just have a new
> > folio_add_new_shared_anon_rmap() instead. TBD.

if we have to add a wrapper like folio_add_new_shared_anon_rmap()
to avoid "if (folio_test_anon(folio))" and "else" everywhere, why not
we just do it in folio_add_anon_rmap_ptes() ?

folio_add_anon_rmap_ptes()
{
      if (!folio_test_anon(folio))
               return folio_add_new_anon_rmap();
}

Anyway, I am going to change the patch 4/6 to if(folio_test_anon)/else first
and drop this 5/6.
we may figure out if we need a wrapper later.

>
> There is more than one caller needed to perform that dance around
> folio_test_anon() then decide which function to call. It would be nice
> to have a wrapper function folio_add_new_shared_anon_rmap() to
> abstract this behavior.
>
>
> >
> > >
> > > BTW, that test might have a race as well. By the time the task got
> > > !anon result, this result might get changed by another task. We need
> > > to make sure in the caller context this race can't happen. Otherwise
> > > we can't do the above safely.
> > Again, folio lock. Observe the folio_lock_or_retry() call that covers
> > our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
>
> Ack. Thanks for the explanation.
>
> Chris

Thanks
Barry