linux-kernel - Re: [PATCH 8/8] shmem,percpu_counter: add _limited

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jh3yqdz43c24ur7w2jjutyvwodsdccefo6ycmtmjyvh25hojn4@aysycyla6pom>
Date: Sat, 25 May 2024 08:00:15 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Tim Chen <tim.c.chen@...el.com>, Dave Chinner <dchinner@...hat.com>, 
	"Darrick J. Wong" <djwong@...nel.org>, Christian Brauner <brauner@...nel.org>, 
	Carlos Maiolino <cem@...nel.org>, Chuck Lever <chuck.lever@...cle.com>, Jan Kara <jack@...e.cz>, 
	Matthew Wilcox <willy@...radead.org>, Johannes Weiner <hannes@...xchg.org>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org
Subject: Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit,
 amount)

On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote:
> Percpu counter's compare and add are separate functions: without locking
> around them (which would defeat their purpose), it has been possible to
> overflow the intended limit.  Imagine all the other CPUs fallocating
> tmpfs huge pages to the limit, in between this CPU's compare and its add.
> 
> I have not seen reports of that happening; but tmpfs's recent addition
> of dquot_alloc_block_nodirty() in between the compare and the add makes
> it even more likely, and I'd be uncomfortable to leave it unfixed.
> 
> Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it.
> 

I think the posted (and by now landed) code is racy.

I had seen there was a follow up patch which further augmented the
routine, but it did not alter the issue below so I'm responding to this
thread.

> +/*
> + * Compare counter, and add amount if the total is within limit.
> + * Return true if amount was added, false if it would exceed limit.
> + */
> +bool __percpu_counter_limited_add(struct percpu_counter *fbc,
> +				  s64 limit, s64 amount, s32 batch)
> +{
> +	s64 count;
> +	s64 unknown;
> +	unsigned long flags;
> +	bool good;
> +
> +	if (amount > limit)
> +		return false;
> +
> +	local_irq_save(flags);
> +	unknown = batch * num_online_cpus();
> +	count = __this_cpu_read(*fbc->counters);
> +
> +	/* Skip taking the lock when safe */
> +	if (abs(count + amount) <= batch &&
> +	    fbc->count + unknown <= limit) {
> +		this_cpu_add(*fbc->counters, amount);
> +		local_irq_restore(flags);
> +		return true;
> +	}
> +

Note the value of fbc->count is *not* stabilized.

> +	raw_spin_lock(&fbc->lock);
> +	count = fbc->count + amount;
> +
> +	/* Skip percpu_counter_sum() when safe */
> +	if (count + unknown > limit) {
> +		s32 *pcount;
> +		int cpu;
> +
> +		for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> +			pcount = per_cpu_ptr(fbc->counters, cpu);
> +			count += *pcount;
> +		}
> +	}
> +
> +	good = count <= limit;
> +	if (good) {
> +		count = __this_cpu_read(*fbc->counters);
> +		fbc->count += count + amount;
> +		__this_cpu_sub(*fbc->counters, count);
> +	}
> +
> +	raw_spin_unlock(&fbc->lock);
> +	local_irq_restore(flags);
> +	return good;
> +}
> +

Consider 2 cpus executing the func at the same time, where one is
updating fbc->counter in the slow path while the other is testing it in
the fast path.

cpu0						cpu1
						if (abs(count + amount) <= batch &&				
						    fbc->count + unknown <= limit)
/* gets past the per-cpu traversal */
/* at this point cpu0 decided to bump fbc->count, while cpu1 decided to
 * bump the local counter */
							this_cpu_add(*fbc->counters, amount);
fbc->count += count + amount;

Suppose fbc->count update puts it close enough to the limit that an
addition from cpu1 would put the entire thing over said limit.

Since the 2 operations are not synchronized cpu1 can observe fbc->count
prior to the bump and update it's local counter, resulting in
aforementioned overflow.

Am I misreading something here? Is this prevented?

To my grep the only user is quotas in shmem and I wonder if that use is
even justified. I am aware of performance realities of atomics. However,
it very well may be that whatever codepaths which exercise the counter
are suffering multicore issues elsewhere, making an atomic (in a
dedicated cacheline) a perfectly sane choice for the foreseeable future.
Should this be true there would be 0 rush working out a fixed variant of
the routine.