linux-kernel - Re: [PATCH RFC 2/3] mm: do_swap_page: use folio_add_new_anon_rmap() if folio_test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGsJ_4wRk2BMpHTERgzcgBr6qODdVPdbqn8hm0YxAWNHxkp48g@mail.gmail.com>
Date: Thu, 13 Jun 2024 22:55:44 +1200
From: Barry Song <21cnbao@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, chrisl@...nel.org, 
	linux-kernel@...r.kernel.org, mhocko@...e.com, ryan.roberts@....com, 
	baolin.wang@...ux.alibaba.com, yosryahmed@...gle.com, shy828301@...il.com, 
	surenb@...gle.com, v-songbaohua@...o.com, willy@...radead.org, 
	ying.huang@...el.com, yuzhao@...gle.com
Subject: Re: [PATCH RFC 2/3] mm: do_swap_page: use folio_add_new_anon_rmap()
 if folio_test_anon(folio)==false

On Thu, Jun 13, 2024 at 10:37 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 13.06.24 11:58, Barry Song wrote:
> > On Thu, Jun 13, 2024 at 9:23 PM David Hildenbrand <david@...hat.com> wrote:
> >>
> >> On 13.06.24 02:07, Barry Song wrote:
> >>> From: Barry Song <v-songbaohua@...o.com>
> >>>
> >>> For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap()
> >>> with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will
> >>> suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating
> >>> the process of bringing up mTHP swapin.
> >>>
> >>>    static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> >>>                    struct page *page, int nr_pages, struct vm_area_struct *vma,
> >>>                    unsigned long address, rmap_t flags, enum rmap_level level)
> >>>    {
> >>>            ...
> >>>            if (unlikely(!folio_test_anon(folio))) {
> >>>                    VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >>>                                     level != RMAP_LEVEL_PMD, folio);
> >>>            }
> >>>            ...
> >>>    }
> >>>
> >>> It also enhances the code’s readability.
> >>>
> >>> Suggested-by: David Hildenbrand <david@...hat.com>
> >>> Signed-off-by: Barry Song <v-songbaohua@...o.com>
> >>> ---
> >>>    mm/memory.c | 2 ++
> >>>    1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/mm/memory.c b/mm/memory.c
> >>> index 2f94921091fb..9c962f62f928 100644
> >>> --- a/mm/memory.c
> >>> +++ b/mm/memory.c
> >>> @@ -4339,6 +4339,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >>>        if (unlikely(folio != swapcache && swapcache)) {
> >>>                folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
> >>>                folio_add_lru_vma(folio, vma);
> >>> +     } else if (!folio_test_anon(folio)) {
> >>> +             folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
> >>
> >> So, with large folio swapin, we would never end up here if any swp PTE
> >> is !exclusive, because we would make sure to allocate a large folio only
> >> for suitable regions, correct?
> >>
> >> It can certainly happen during swapout + refault with large folios. But
> >> there, we will always have folio_test_anon() still set and wouldn't run
> >> into this code path.
> >>
> >> (it will all be better with per-folio PAE bit, but that will take some
> >> more time until I actually get to implement it properly, handling all
> >> the nasty corner cases)
> >>
> >> Better add a comment regarding why we are sure that the complete thing
> >> is exclusive (e.g., currently only small folios).
> >
> > No, patch 1/3 enables both cases to call folio_add_new_anon_rmap(). Currently,
> > small folios could be non-exclusive. I suppose your question is
> > whether we should
> > ensure that all pages within a mTHP have the same exclusivity status,
> > rather than
> > always being exclusive?
>
> We must never end up calling folio_add_new_anon_rmap(folio, vma,
> address, RMAP_EXCLUSIVE) if part of the folio is non-exclusive.

Agreed.

>
> I think we *may* call folio_add_new_anon_rmap(folio, vma, address, 0) if
> part of the folio is exclusive [it go swapped out, so there cannot be
> GUP references]. But, to be future proof (single PAE bit per folio), we
> better avoid that on swapin if possible.
>
> For small folios, it's clear that it cannot be partially exclusive
> (single page).

Yes, for small folios, it is much simpler; they are either entirely exclusive or
entirely non-exclusive.

For the initial swapin patch[1], which only supports SYNC-IO mTHP swapin,
mTHP is always entirely exclusive. I am also considering non-SYNCHRONOUS_IO
swapcache-based mTHP swapin but intend to start with the entirely exclusive
cases.

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/

>
> We better comment why we obey to the above. Like
>
> "We currently ensure that new folios cannot be partially exclusive: they
> are either fully exclusive or fully shared."
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry