lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLdLDEW2d3hK4gUV@casper.infradead.org>
Date: Tue, 2 Sep 2025 20:52:44 +0100
From: Matthew Wilcox <willy@...radead.org>
To: David Hildenbrand <david@...hat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@...il.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm: tag kernel stack pages

On Thu, Aug 21, 2025 at 02:44:31PM +0200, David Hildenbrand wrote:
> On 20.08.25 22:20, Vishal Moola (Oracle) wrote:
> > Currently, we have no way to distinguish a kernel stack page from an
> > unidentified page. Being able to track this information can be
> > beneficial for optimizing kernel memory usage (i.e. analyzing
> > fragmentation, location etc.). Knowing a page is being used for a kernel
> > stack gives us more insight about pages that are certainly immovable and
> > important to kernel functionality.
> 
> It's a very niche use case. Anything that's not clearly a folio or a special
> movable_ops page is certainly immovable. So we can identify pretty reliable
> what's movable and what's not.
> 
> Happy to learn how you would want to use that knowledge to reduce
> fragmentation. :)
> 
> So this reads a bit hand-wavy.

I have a theory that we should always be attempting to do aligned
allocations if we can, falling back to individual allocations if
we can't.  This is an attempt to gather some data to inform us whether
that theory is true, and to help us measure whether any effort we
take to improve that situation is effective.

Eyeballing the output of tools/testing/page-types certainly lends
some credence to this.  On x86-64 with its 16KiB stacks and 4KiB
page size, we often see four consecutive pages allocated as type
KernelStack, and as you'd expect only about 25% of the time are they
aligned to a 16KiB boundary.  That is, at least 75% of the time they
prevent _two_ order-2 pages from being available.

As you say, they're not movable.  I'm not sure if it makes sense to
go to the effort of making them movable; it'd require interacting
with the scheduler (to prevent the task we're relocating from
being scheduled), and I don't think the realtime people would be
terribly keen on that idea.  So that isn't one of the ideas we
have on the table for improving matters.

Ideas we have been batting around:

 - Have kernel stacks try to do an order-N allocation and vmap()
   the result, fall back to current implementation
 - Have vmalloc try to do an order-N allocation, fall back down the
   orders on failure to allocate
 - Change the alloc_bulk implementation to do the order-N allocation
   and fall back

I'm sure other possibilities also exist.

> staring at [1], we allocate from vmalloc, so I would assume that these will
> be vmalloc-typed pages in the future and we cannot change the type later.
> 
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs

I see the vmalloc subtype as being a "we don't know any better" type.
We could allocate another subtype of type 0 to mean "kernel stacks"
and have it be implicit that kernel stacks are allocated from vmalloc.
This would probably require that we have a vmalloc interface that lets us
specify a subtype, which I think is probably something we'd want anyway.

I think it's fine to say "This doesn't add enough value to merge it
upstream".  I will note one minor advantage which is that typing these
pages as PGTY_kstack today prevents them from being inadvertently mapped
to userspace (whether by malicious code or innocent bug).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ