lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org>
Date: Wed, 17 Jan 2024 14:00:55 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>, Linus Torvalds
 <torvalds@...ux-foundation.org>, Chuck Lever <chuck.lever@...cle.com>, 
 Shakeel Butt <shakeelb@...gle.com>, Roman Gushchin
 <roman.gushchin@...ux.dev>, Johannes Weiner <hannes@...xchg.org>, Michal
 Hocko <mhocko@...nel.org>
Cc: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>, Tejun Heo
	 <tj@...nel.org>, Vasily Averin <vasily.averin@...ux.dev>, Michal Koutny
	 <mkoutny@...e.com>, Waiman Long <longman@...hat.com>, Muchun Song
	 <muchun.song@...ux.dev>, Jiri Kosina <jikos@...nel.org>, 
	cgroups@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again

On Wed, 2024-01-17 at 08:14 -0800, Josh Poimboeuf wrote:
> A container can exceed its memcg limits by allocating a bunch of file
> locks.
> 
> This bug was originally fixed by commit 0f12156dff28 ("memcg: enable
> accounting for file lock caches"), but was later reverted by commit
> 3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
> due to performance issues.
> 
> Unfortunately those performance issues were never addressed and the bug
> has remained unfixed for over two years.
> 
> Fix it by default but allow users to disable it with a cmdline option
> (flock_accounting=off).
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@...nel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         | 17 +++++++++++
>  fs/locks.c                                    | 30 +++++++++++++++++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6ee0f9a5da70..91987b06bc52 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1527,6 +1527,23 @@
>  			See Documentation/admin-guide/sysctl/net.rst for
>  			fb_tunnels_only_for_init_ns
>  
> +	flock_accounting=
> +			[KNL] Enable/disable accounting for kernel
> +			memory allocations related to file locks.
> +			Format: { on | off }
> +			Default: on
> +			on:	Enable kernel memory accounting for file
> +				locks.  This prevents task groups from
> +				exceeding their memcg allocation limits.
> +				However, it may cause slowdowns in the
> +				flock() system call.
> +			off:	Disable kernel memory accounting for
> +				file locks.  This may allow a rogue task
> +				to DoS the system by forcing the kernel
> +				to allocate memory beyond the task
> +				group's memcg limits.  Not recommended
> +				unless you have trusted user space.
> +
>  	floppy=		[HW]
>  			See Documentation/admin-guide/blockdev/floppy.rst.
>  
> diff --git a/fs/locks.c b/fs/locks.c
> index cc7c117ee192..235ac56c557d 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2905,15 +2905,41 @@ static int __init proc_locks_init(void)
>  fs_initcall(proc_locks_init);
>  #endif
>  
> +static bool flock_accounting __ro_after_init = true;
> +
> +static int __init flock_accounting_cmdline(char *str)
> +{
> +	if (!str)
> +		return -EINVAL;
> +
> +	if (!strcmp(str, "off"))
> +		flock_accounting = false;
> +	else if (!strcmp(str, "on"))
> +		flock_accounting = true;
> +	else
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +early_param("flock_accounting", flock_accounting_cmdline);
> +
> +#define FLOCK_ACCOUNTING_MSG "WARNING: File lock accounting is disabled, container-triggered host memory exhaustion possible!\n"
> +
>  static int __init filelock_init(void)
>  {
>  	int i;
> +	slab_flags_t flags = SLAB_PANIC;
> +
> +	if (!flock_accounting)
> +		pr_err(FLOCK_ACCOUNTING_MSG);
> +	else
> +		flags |= SLAB_ACCOUNT;
>  
>  	flctx_cache = kmem_cache_create("file_lock_ctx",
> -			sizeof(struct file_lock_context), 0, SLAB_PANIC, NULL);
> +			sizeof(struct file_lock_context), 0, flags, NULL);
>  
>  	filelock_cache = kmem_cache_create("file_lock_cache",
> -			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
> +			sizeof(struct file_lock), 0, flags, NULL);
>  
>  	for_each_possible_cpu(i) {
>  		struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);

I'm really not a fan of tunables or different kconfig options,
especially for something niche like this.

I also question whether this accounting will show up under any real-
world workloads, and whether it was just wrong to revert those patches
back in 2021.

File locking is an activity where we inherently expect to block. Ideally
we don't if the lock is uncontended of course, but it's always a
possibility.

The benchmark that prompted the regression basically just tries to
create and release a bunch of file locks as quickly as possible.
Legitimate applications that do a lot of very rapid locking like this
benchmark are basically non-existent. Usually the pattern is:

    acquire lock
    do some (relatively slow) I/O
    release lock

In that sort of scenario, is this memcg accounting more than just line
noise? I wonder whether we should just bite the bullet and see whether
there are any real workloads that suffer due to SLAB_ACCOUNT being
enabled on these caches?
-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ