lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-2pSF7Zu0CrLBy_@dread.disaster.area>
Date: Thu, 3 Apr 2025 08:16:56 +1100
From: Dave Chinner <david@...morbit.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Yafang Shao <laoar.shao@...il.com>, Harry Yoo <harry.yoo@...cle.com>,
	Kees Cook <kees@...nel.org>, joel.granados@...nel.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Josef Bacik <josef@...icpanda.com>, linux-mm@...ck.org,
	Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH] proc: Avoid costly high-order page allocations when
 reading proc files

On Wed, Apr 02, 2025 at 02:24:45PM +0200, Michal Hocko wrote:
> On Wed 02-04-25 22:32:14, Dave Chinner wrote:
> > Have a look at xlog_kvmalloc() in XFS. It implements a basic
> > fast-fail, no retry high order kmalloc before it falls back to
> > vmalloc by turning off direct reclaim for the kmalloc() call.
> > Hence if the there isn't a high-order page on the free lists ready
> > to allocate, it falls back to vmalloc() immediately.
> > 
> > For XFS, using xlog_kvmalloc() reduced the high-order per-allocation
> > overhead by around 80% when compared to a standard kvmalloc()
> > call. Numbers and profiles were documented in the commit message
> > (reproduced in whole below)...
> 
> Btw. it would be really great to have such concerns to be posted to the
> linux-mm ML so that we are aware of that.

I have brought it up in the past, along with all the other kvmalloc
API problems that are mentioned in that commit message.
Unfortunately, discussion focus always ended up on calling context
and API flags (e.g. whether stuff like GFP_NOFS should be supported
or not) no the fast-fail-then-no-fail behaviour we need.

Yes, these discussions have resulted in API changes that support
some new subset of gfp flags, but the performance issues have never
been addressed...

> kvmalloc currently doesn't support GFP_NOWAIT semantic but it does allow
> to express - I prefer SLAB allocator over vmalloc.

The conditional use of __GFP_NORETRY for the kmalloc call is broken
if we try to use __GFP_NOFAIL with kvmalloc() - this causes the gfp
mask to hold __GFP_NOFAIL | __GFP_NORETRY....

We have a hard requirement for xlog_kvmalloc() to provide
__GFP_NOFAIL semantics.

IOWs, we need kvmalloc() to support kmalloc(GFP_NOWAIT) for
performance with fallback to vmalloc(__GFP_NOFAIL) for
correctness...

> I think we could make
> the default kvmalloc slab path weaker by default as those who really
> want slab already have means to achieve that. There is a risk of long
> term fragmentation but I think this is worth trying

We've been doing this for a few years now in XFS in a hot path that
can make in the order of a million xlog_kvmalloc() calls a second.
We've not seen any evidence that this causes or exacerbates memory
fragmentation....

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ