linux-kernel - Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAGsJ_4xKTj1PwmJAAZAzAvEN53kze5wSPHb01pVg9LBy80axGA@mail.gmail.com>
Date: Sun, 7 Apr 2024 11:27:26 +1200
From: Barry Song <21cnbao@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: Chris Li <chrisl@...nel.org>, ryan.roberts@....com, akpm@...ux-foundation.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, mhocko@...e.com, 
	shy828301@...il.com, wangkefeng.wang@...wei.com, willy@...radead.org, 
	xiang@...nel.org, ying.huang@...el.com, yuzhao@...gle.com, surenb@...gle.com, 
	steven.price@....com, Barry Song <v-songbaohua@...o.com>, 
	Chuanhua Han <hanchuanhua@...o.com>
Subject: Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()

On Mon, Jan 29, 2024 at 11:07 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 29.01.24 04:25, Chris Li wrote:
> > Hi David and Barry,
> >
> > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@...il.com> wrote:
> >>
> >>>
> >>>
> >>> I have on my todo list to move all that !anon handling out of
> >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> >>> then (-> whole new folio exclusive).
> >>>
> >>> That's the cleaner approach.
> >>>
> >>
> >> one tricky thing is that sometimes it is hard to know who is the first
> >> one to add rmap and thus should
> >> call folio_add_new_anon_rmap.
> >> especially when we want to support swapin_readahead(), the one who
> >> allocated large filio might not
> >> be that one who firstly does rmap.
> >
> > I think Barry has a point. Two tasks might race to swap in the folio
> > then race to perform the rmap.
> > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > "new", not shared. The sharing in swap cache disqualifies that
> > condition.
>
> We have to hold the folio lock. So only one task at a time might do the
> folio_add_anon_rmap_ptes() right now, and the
> folio_add_new_shared_anon_rmap() in the future [below].
>
> Also observe how folio_add_anon_rmap_ptes() states that one must hold
> the page lock, because otherwise this would all be completely racy.
>
>  From the pte swp exclusive flags, we know for sure whether we are
> dealing with exclusive vs. shared. I think patch #6 does not properly
> check that all entries are actually the same in that regard (all
> exclusive vs all shared). That likely needs fixing.
>
> [I have converting per-page PageAnonExclusive flags to a single
> per-folio flag on my todo list. I suspect that we'll keep the
> per-swp-pte exlusive bits, but the question is rather what we can
> actually make work, because swap and migration just make it much more
> complicated. Anyhow, future work]
>
> >
> >> is it an acceptable way to do the below in do_swap_page?
> >> if (!folio_test_anon(folio))
> >>        folio_add_new_anon_rmap()
> >> else
> >>        folio_add_anon_rmap_ptes()
> >
> > I am curious to know the answer as well.
>
>
> Yes, the end code should likely be something like:
>
> /* ksm created a completely new copy */
> if (unlikely(folio != swapcache && swapcache)) {
>         folio_add_new_anon_rmap(folio, vma, vmf->address);
>         folio_add_lru_vma(folio, vma);
> } else if (folio_test_anon(folio)) {
>         folio_add_anon_rmap_ptes(rmap_flags)
> } else {
>         folio_add_new_anon_rmap(rmap_flags)
> }
>
> Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> callers about a new flag, and just have a new
> folio_add_new_shared_anon_rmap() instead. TBD.

right.

We need to clarify that the new anon_folio might not necessarily be exclusive.
Unlike folio_add_new_anon_rmap, which assumes the new folio is exclusive,
folio_add_anon_rmap_ptes is capable of handling both exclusive and
non-exclusive new anon folios.

The code would be like:

 if (unlikely(folio != swapcache && swapcache)) {
         folio_add_new_anon_rmap(folio, vma, vmf->address);
         folio_add_lru_vma(folio, vma);
 } else if (!folio_test_anon(folio)) {
         folio_add_anon_rmap_ptes(rmap_flags);
 } else {
         if (exclusive)
                 folio_add_new_anon_rmap();
         else
                 folio_add_new_shared_anon_rmap();
 }

It appears a bit lengthy?

>
> >
> > BTW, that test might have a race as well. By the time the task got
> > !anon result, this result might get changed by another task. We need
> > to make sure in the caller context this race can't happen. Otherwise
> > we can't do the above safely.
> Again, folio lock. Observe the folio_lock_or_retry() call that covers
> our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry