linux-kernel - Re: [RFC PATCH 0/3] Prototype for direct map awareness in page allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220430134415.GA25819@ip-172-31-27-201.ap-northeast-1.compute.internal>
Date:   Sat, 30 Apr 2022 13:44:16 +0000
From:   Hyeonggon Yoo <42.hyeyoo@...il.com>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Kees Cook <keescook@...omium.org>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        x86@...nel.org
Subject: Re: [RFC PATCH 0/3] Prototype for direct map awareness in page
 allocator

On Tue, Apr 26, 2022 at 06:21:57PM +0300, Mike Rapoport wrote:
> Hello Hyeonggon,
> 
> On Tue, Apr 26, 2022 at 05:54:49PM +0900, Hyeonggon Yoo wrote:
> > On Thu, Jan 27, 2022 at 10:56:05AM +0200, Mike Rapoport wrote:
> > > From: Mike Rapoport <rppt@...ux.ibm.com>
> > > 
> > > Hi,
> > > 
> > > This is a second attempt to make page allocator aware of the direct map
> > > layout and allow grouping of the pages that must be mapped at PTE level in
> > > the direct map.
> > >
> > 
> > Hello mike, It may be a silly question...
> > 
> > Looking at implementation of set_memory*(), they only split
> > PMD/PUD-sized entries. But why not _merge_ them when all entries
> > have same permissions after changing permission of an entry?
> > 
> > I think grouping __GFP_UNMAPPED allocations would help reducing
> > direct map fragmentation, but IMHO merging split entries seems better
> > to be done in those helpers than in page allocator.
>
> Maybe, I didn't got as far as to try merging split entries in the direct
> map.  IIRC, Kirill sent a patch for collapsing huge pages in the direct map
> some time ago, but there still was something that had to initiate the
> collapse.

But in this case buddy allocator's view of direct map is quite limited.
It cannot merge 2M entries to 1G entry as it does not support
big allocations. Also it cannot merge entries of pages freed in boot process
as they weren't allocated from page allocator.

And it will become harder when pages in MIGRATE_UNMAPPED is borrowed
from another migrate type....

So it would be nice if we can efficiently merge mappings in
change_page_attr_set(). this approach can handle cases above.

I think in this case grouping allocations and merging mappings
should be done separately.

> > For example:
> > 	1) set_memory_ro() splits 1 RW PMD entry into 511 RW PTE
> > 	entries and 1 RO PTE entry.
> > 
> > 	2) before freeing the pages, we call set_memory_rw() and we have
> > 	512 RW PTE entries. Then we can merge it to 1 RW PMD entry.
> 
> For this we need to check permissions of all 512 pages to make sure we can
> use a PMD entry to map them.

Of course that may be slow. Maybe one way to optimize this is using some bits
in struct page, something like: each bit of page->direct_map_split (unsigned long)
is set when at least one entry in (PTRS_PER_PTE = 512)/(BITS_PER_LONG = 64) = 8 entries
has special permissions.

Then we just need to set the corresponding bit when splitting mappings and
iterate 8 entries when changing permission back again. (and then unset the bit when 8 entries has
usual permissions). we can decide to merge by checking if page->direct_map_split is zero.

When scanning, 8 entries would fit into one cacheline.

Any other ideas?

> Not sure that doing the scan in each set_memory call won't cause an overall
> slowdown.

I think we can evaluate it by measuring boot time and bpf/module
load/unload time.

Is there any other workload that is directly affected
by performance of set_memory*()?

> > 	3) after 2) we can do same thing about PMD-sized entries
> > 	and merge them into 1 PUD entry if 512 PMD entries have
> > 	same permissions.
> > [...]
> > > Mike Rapoport (3):
> > >   mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED
> > >   mm/secretmem: use __GFP_UNMAPPED to allocate pages
> > >   EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc
> > -- 
> > Thanks,
> > Hyeonggon
> 
> -- 
> Sincerely yours,
> Mike.