[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZCPzI3ns2PusACi6@dhcp22.suse.cz>
Date: Wed, 29 Mar 2023 10:13:23 +0200
From: Michal Hocko <mhocko@...e.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Song Liu <song@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
x86@...nel.org
Subject: Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()
On Wed 29-03-23 10:28:02, Mike Rapoport wrote:
> On Tue, Mar 28, 2023 at 05:24:49PM +0200, Michal Hocko wrote:
> > On Tue 28-03-23 18:11:20, Mike Rapoport wrote:
> > > On Tue, Mar 28, 2023 at 09:39:37AM +0200, Michal Hocko wrote:
> > [...]
> > > > OK, so you want to reduce that direct map fragmentation?
> > >
> > > Yes.
> > >
> > > > Is that a real problem?
> > >
> > > A while ago Intel folks published report [1] that showed better performance
> > > with large pages in the direct map for majority of benchmarks.
> > >
> > > > My impression is that modules are mostly static thing. BPF
> > > > might be a different thing though. I have a recollection that BPF guys
> > > > were dealing with direct map fragmention as well.
> > >
> > > Modules are indeed static, but module_alloc() used by anything that
> > > allocates code pages, e.g. kprobes, ftrace and BPF. Besides, Thomas
> > > mentioned that having code in 2M pages reduces iTLB pressure [2], but
> > > that's not only about avoiding the splits in the direct map but also about
> > > using large mappings in the modules address space.
> > >
> > > BPF guys suggested an allocator for executable memory [3] mainly because
> > > they've seen performance improvement of 0.6% - 0.9% in their setups [4].
> >
> > These are fair arguments and it would have been better to have them in
> > the RFC. Also it is not really clear to me what is the actual benefit of
> > the unmapping for those usecases. I do get they want to benefit from
> > caching on the same permission setup but do they need unmapping as well?
>
> The pages allocated with module_alloc() get different permissions depending
> on whether they belong to text, rodata, data etc. The permissions are
> updated in both vmalloc address space and in the direct map. The updates to
> the direct map cause splits of the large pages.
That much is clear (wouldn't hurt to mention that in the changelog
though).
> If we cache large pages as
> unmapped we take out the entire 2M page from the direct map and then
> if/when it becomes free it can be returned to the direct map as a 2M page.
>
> Generally, the unmapped allocations are intended for use-cases that anyway
> map the memory elsewhere than direct map and need to modify direct mappings
> of the memory, be it modules_alloc(), secretmem, PKS page tables or maybe
> even some of the encrypted VM memory.
I believe we are still not on the same page here. I do understand that
you want to re-use the caching capability of the unmapped_pages_alloc
for modules allocations as well. My question is whether module_alloc
really benefits from the fact that the memory is unmapped?
I guess you want to say that it does because it wouldn't have to change
the permission for the direct map but I do not see that anywhere in the
patch... Also consinder the slow path where module_alloc needs to
allocate a fresh (huge)page and unmap it. Does the overhead of the
unmapping suits needs of all module_alloc users? Module loader is likely
not interesting as it tends to be rather static but what about BPF
programs?
[...]
> > > I also think vmalloc with unmmapped pages can provide backing pages for
> > > execmem_alloc() Song proposed.
> >
> > Why would you need to have execmem_alloc have its memory virtually
> > mapped into vmalloc space?
>
> Currently all the memory allocated for code is managed in a subset of
> vmalloc() space. The intention of execmem_alloc() was to replace
> module_alloc() for the code pages, so it's natural that it will use the
> same virtual ranges.
Ohh, sorry my bad. I have only now realized I have misread and thought
about secretmem_alloc...
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists