[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w0f+1-8o+5-Vaj2xfO1Q5tm=AJQVrsST50nWihf02ynQ@mail.gmail.com>
Date: Sat, 18 Nov 2023 04:36:42 +0800
From: Barry Song <21cnbao@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: David Hildenbrand <david@...hat.com>, steven.price@....com,
akpm@...ux-foundation.org, ryan.roberts@....com,
catalin.marinas@....com, will@...nel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.com,
shy828301@...il.com, v-songbaohua@...o.com,
wangkefeng.wang@...wei.com, xiang@...nel.org, ying.huang@...el.com,
yuzhao@...gle.com
Subject: Re: [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for
large folios
On Sat, Nov 18, 2023 at 3:37 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Fri, Nov 17, 2023 at 07:47:00AM +0800, Barry Song wrote:
> > This has been discussed. Steven, Ryan and I all don't think this is a good
> > option. in case we have a large folio with 16 basepages, as do_swap_page
> > can only map one base page for each page fault, that means we have
> > to restore 16(tags we restore in each page fault) * 16(the times of page faults)
> > for this large folio.
>
> That doesn't seem all that hard to fix? Call set_ptes() instead of
> set_pte_at(). The biggest thing, I guess, is making sure that all
> the PTEs you're going to set up are still pte_none().
I guess you mean all are still swap entries in ptes. some risks I can see
1. vma might be splitted after folios added into swapcache, for example
unmap or mprotect a part of large folios from userspace
2. vma is not splitted, but some basepages are MADV_DONTNEED
within the folios.
3. basepages in the large folio might become having different permissions
on R/W/X.
for example, if a large folio has 16 basepages, as userspace is still
working at 4kb, userspace can mprotect RD_ONLY for a part of them,
in this case, 16PTEs will still be swap entries, but the re-use for
write fault can't work at folio granularity.
I need to consider all the above DoubleMap/split risks rather than simply
checking PTEs as userspace is still 4KB.
>
Thanks
Barry
Powered by blists - more mailing lists