[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YnCzQJk8Mu1848tG@kernel.org>
Date: Mon, 2 May 2022 21:44:48 -0700
From: Mike Rapoport <rppt@...nel.org>
To: Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ira Weiny <ira.weiny@...el.com>,
Kees Cook <keescook@...omium.org>,
Mike Rapoport <rppt@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
x86@...nel.org
Subject: Re: [RFC PATCH 0/3] Prototype for direct map awareness in page
allocator
On Sat, Apr 30, 2022 at 01:44:16PM +0000, Hyeonggon Yoo wrote:
> On Tue, Apr 26, 2022 at 06:21:57PM +0300, Mike Rapoport wrote:
> > Hello Hyeonggon,
> >
> > On Tue, Apr 26, 2022 at 05:54:49PM +0900, Hyeonggon Yoo wrote:
> > > On Thu, Jan 27, 2022 at 10:56:05AM +0200, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@...ux.ibm.com>
> > > >
> > > > Hi,
> > > >
> > > > This is a second attempt to make page allocator aware of the direct map
> > > > layout and allow grouping of the pages that must be mapped at PTE level in
> > > > the direct map.
> > > >
> > >
> > > Hello mike, It may be a silly question...
> > >
> > > Looking at implementation of set_memory*(), they only split
> > > PMD/PUD-sized entries. But why not _merge_ them when all entries
> > > have same permissions after changing permission of an entry?
> > >
> > > I think grouping __GFP_UNMAPPED allocations would help reducing
> > > direct map fragmentation, but IMHO merging split entries seems better
> > > to be done in those helpers than in page allocator.
> >
> > Maybe, I didn't got as far as to try merging split entries in the direct
> > map. IIRC, Kirill sent a patch for collapsing huge pages in the direct map
> > some time ago, but there still was something that had to initiate the
> > collapse.
>
> But in this case buddy allocator's view of direct map is quite limited.
> It cannot merge 2M entries to 1G entry as it does not support
> big allocations. Also it cannot merge entries of pages freed in boot process
> as they weren't allocated from page allocator.
>
> And it will become harder when pages in MIGRATE_UNMAPPED is borrowed
> from another migrate type....
>
> So it would be nice if we can efficiently merge mappings in
> change_page_attr_set(). this approach can handle cases above.
>
> I think in this case grouping allocations and merging mappings
> should be done separately.
I've added the provision to merge the mappings in __free_one_page() because
at that spot we know for sure we can replace multiple PTEs with a single
PMD.
I'm not saying there should be no additional mechanism for collapsing
direct map pages, but I don't know when and how it should be invoked.
> > > For example:
> > > 1) set_memory_ro() splits 1 RW PMD entry into 511 RW PTE
> > > entries and 1 RO PTE entry.
> > >
> > > 2) before freeing the pages, we call set_memory_rw() and we have
> > > 512 RW PTE entries. Then we can merge it to 1 RW PMD entry.
> >
> > For this we need to check permissions of all 512 pages to make sure we can
> > use a PMD entry to map them.
>
> Of course that may be slow. Maybe one way to optimize this is using some bits
> in struct page, something like: each bit of page->direct_map_split (unsigned long)
> is set when at least one entry in (PTRS_PER_PTE = 512)/(BITS_PER_LONG = 64) = 8 entries
> has special permissions.
>
> Then we just need to set the corresponding bit when splitting mappings and
> iterate 8 entries when changing permission back again. (and then unset the bit when 8 entries has
> usual permissions). we can decide to merge by checking if page->direct_map_split is zero.
>
> When scanning, 8 entries would fit into one cacheline.
>
> Any other ideas?
>
> > Not sure that doing the scan in each set_memory call won't cause an overall
> > slowdown.
>
> I think we can evaluate it by measuring boot time and bpf/module
> load/unload time.
>
> Is there any other workload that is directly affected
> by performance of set_memory*()?
>
> > > 3) after 2) we can do same thing about PMD-sized entries
> > > and merge them into 1 PUD entry if 512 PMD entries have
> > > same permissions.
> > > [...]
> > > > Mike Rapoport (3):
> > > > mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED
> > > > mm/secretmem: use __GFP_UNMAPPED to allocate pages
> > > > EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc
> > > --
> > > Thanks,
> > > Hyeonggon
> >
> > --
> > Sincerely yours,
> > Mike.
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists