[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jh3yqdz43c24ur7w2jjutyvwodsdccefo6ycmtmjyvh25hojn4@aysycyla6pom>
Date: Sat, 25 May 2024 08:00:15 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Tim Chen <tim.c.chen@...el.com>, Dave Chinner <dchinner@...hat.com>,
"Darrick J. Wong" <djwong@...nel.org>, Christian Brauner <brauner@...nel.org>,
Carlos Maiolino <cem@...nel.org>, Chuck Lever <chuck.lever@...cle.com>, Jan Kara <jack@...e.cz>,
Matthew Wilcox <willy@...radead.org>, Johannes Weiner <hannes@...xchg.org>,
Axel Rasmussen <axelrasmussen@...gle.com>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH 8/8] shmem,percpu_counter: add _limited_add(fbc, limit,
amount)
On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote:
> Percpu counter's compare and add are separate functions: without locking
> around them (which would defeat their purpose), it has been possible to
> overflow the intended limit. Imagine all the other CPUs fallocating
> tmpfs huge pages to the limit, in between this CPU's compare and its add.
>
> I have not seen reports of that happening; but tmpfs's recent addition
> of dquot_alloc_block_nodirty() in between the compare and the add makes
> it even more likely, and I'd be uncomfortable to leave it unfixed.
>
> Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it.
>
I think the posted (and by now landed) code is racy.
I had seen there was a follow up patch which further augmented the
routine, but it did not alter the issue below so I'm responding to this
thread.
> +/*
> + * Compare counter, and add amount if the total is within limit.
> + * Return true if amount was added, false if it would exceed limit.
> + */
> +bool __percpu_counter_limited_add(struct percpu_counter *fbc,
> + s64 limit, s64 amount, s32 batch)
> +{
> + s64 count;
> + s64 unknown;
> + unsigned long flags;
> + bool good;
> +
> + if (amount > limit)
> + return false;
> +
> + local_irq_save(flags);
> + unknown = batch * num_online_cpus();
> + count = __this_cpu_read(*fbc->counters);
> +
> + /* Skip taking the lock when safe */
> + if (abs(count + amount) <= batch &&
> + fbc->count + unknown <= limit) {
> + this_cpu_add(*fbc->counters, amount);
> + local_irq_restore(flags);
> + return true;
> + }
> +
Note the value of fbc->count is *not* stabilized.
> + raw_spin_lock(&fbc->lock);
> + count = fbc->count + amount;
> +
> + /* Skip percpu_counter_sum() when safe */
> + if (count + unknown > limit) {
> + s32 *pcount;
> + int cpu;
> +
> + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> + pcount = per_cpu_ptr(fbc->counters, cpu);
> + count += *pcount;
> + }
> + }
> +
> + good = count <= limit;
> + if (good) {
> + count = __this_cpu_read(*fbc->counters);
> + fbc->count += count + amount;
> + __this_cpu_sub(*fbc->counters, count);
> + }
> +
> + raw_spin_unlock(&fbc->lock);
> + local_irq_restore(flags);
> + return good;
> +}
> +
Consider 2 cpus executing the func at the same time, where one is
updating fbc->counter in the slow path while the other is testing it in
the fast path.
cpu0 cpu1
if (abs(count + amount) <= batch &&
fbc->count + unknown <= limit)
/* gets past the per-cpu traversal */
/* at this point cpu0 decided to bump fbc->count, while cpu1 decided to
* bump the local counter */
this_cpu_add(*fbc->counters, amount);
fbc->count += count + amount;
Suppose fbc->count update puts it close enough to the limit that an
addition from cpu1 would put the entire thing over said limit.
Since the 2 operations are not synchronized cpu1 can observe fbc->count
prior to the bump and update it's local counter, resulting in
aforementioned overflow.
Am I misreading something here? Is this prevented?
To my grep the only user is quotas in shmem and I wonder if that use is
even justified. I am aware of performance realities of atomics. However,
it very well may be that whatever codepaths which exercise the counter
are suffering multicore issues elsewhere, making an atomic (in a
dedicated cacheline) a perfectly sane choice for the foreseeable future.
Should this be true there would be 0 rush working out a fixed variant of
the routine.
Powered by blists - more mailing lists