[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xKTj1PwmJAAZAzAvEN53kze5wSPHb01pVg9LBy80axGA@mail.gmail.com>
Date: Sun, 7 Apr 2024 11:27:26 +1200
From: Barry Song <21cnbao@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: Chris Li <chrisl@...nel.org>, ryan.roberts@....com, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, mhocko@...e.com,
shy828301@...il.com, wangkefeng.wang@...wei.com, willy@...radead.org,
xiang@...nel.org, ying.huang@...el.com, yuzhao@...gle.com, surenb@...gle.com,
steven.price@....com, Barry Song <v-songbaohua@...o.com>,
Chuanhua Han <hanchuanhua@...o.com>
Subject: Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
On Mon, Jan 29, 2024 at 11:07 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 29.01.24 04:25, Chris Li wrote:
> > Hi David and Barry,
> >
> > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@...il.com> wrote:
> >>
> >>>
> >>>
> >>> I have on my todo list to move all that !anon handling out of
> >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add
> >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
> >>> then (-> whole new folio exclusive).
> >>>
> >>> That's the cleaner approach.
> >>>
> >>
> >> one tricky thing is that sometimes it is hard to know who is the first
> >> one to add rmap and thus should
> >> call folio_add_new_anon_rmap.
> >> especially when we want to support swapin_readahead(), the one who
> >> allocated large filio might not
> >> be that one who firstly does rmap.
> >
> > I think Barry has a point. Two tasks might race to swap in the folio
> > then race to perform the rmap.
> > folio_add_new_anon_rmap() should only call a folio that is absolutely
> > "new", not shared. The sharing in swap cache disqualifies that
> > condition.
>
> We have to hold the folio lock. So only one task at a time might do the
> folio_add_anon_rmap_ptes() right now, and the
> folio_add_new_shared_anon_rmap() in the future [below].
>
> Also observe how folio_add_anon_rmap_ptes() states that one must hold
> the page lock, because otherwise this would all be completely racy.
>
> From the pte swp exclusive flags, we know for sure whether we are
> dealing with exclusive vs. shared. I think patch #6 does not properly
> check that all entries are actually the same in that regard (all
> exclusive vs all shared). That likely needs fixing.
>
> [I have converting per-page PageAnonExclusive flags to a single
> per-folio flag on my todo list. I suspect that we'll keep the
> per-swp-pte exlusive bits, but the question is rather what we can
> actually make work, because swap and migration just make it much more
> complicated. Anyhow, future work]
>
> >
> >> is it an acceptable way to do the below in do_swap_page?
> >> if (!folio_test_anon(folio))
> >> folio_add_new_anon_rmap()
> >> else
> >> folio_add_anon_rmap_ptes()
> >
> > I am curious to know the answer as well.
>
>
> Yes, the end code should likely be something like:
>
> /* ksm created a completely new copy */
> if (unlikely(folio != swapcache && swapcache)) {
> folio_add_new_anon_rmap(folio, vma, vmf->address);
> folio_add_lru_vma(folio, vma);
> } else if (folio_test_anon(folio)) {
> folio_add_anon_rmap_ptes(rmap_flags)
> } else {
> folio_add_new_anon_rmap(rmap_flags)
> }
>
> Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
> callers about a new flag, and just have a new
> folio_add_new_shared_anon_rmap() instead. TBD.
right.
We need to clarify that the new anon_folio might not necessarily be exclusive.
Unlike folio_add_new_anon_rmap, which assumes the new folio is exclusive,
folio_add_anon_rmap_ptes is capable of handling both exclusive and
non-exclusive new anon folios.
The code would be like:
if (unlikely(folio != swapcache && swapcache)) {
folio_add_new_anon_rmap(folio, vma, vmf->address);
folio_add_lru_vma(folio, vma);
} else if (!folio_test_anon(folio)) {
folio_add_anon_rmap_ptes(rmap_flags);
} else {
if (exclusive)
folio_add_new_anon_rmap();
else
folio_add_new_shared_anon_rmap();
}
It appears a bit lengthy?
>
> >
> > BTW, that test might have a race as well. By the time the task got
> > !anon result, this result might get changed by another task. We need
> > to make sure in the caller context this race can't happen. Otherwise
> > we can't do the above safely.
> Again, folio lock. Observe the folio_lock_or_retry() call that covers
> our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
>
> --
> Cheers,
>
> David / dhildenb
Thanks
Barry
Powered by blists - more mailing lists