[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250226102703.3F7Aa2oK@linutronix.de>
Date: Wed, 26 Feb 2025 11:27:03 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Joe Damato <jdamato@...tly.com>, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew+netdev@...n.ch>,
Eric Dumazet <edumazet@...gle.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Leon Romanovsky <leon@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Saeed Mahameed <saeedm@...dia.com>, Simon Horman <horms@...nel.org>,
Tariq Toukan <tariqt@...dia.com>,
Thomas Gleixner <tglx@...utronix.de>,
Yunsheng Lin <linyunsheng@...wei.com>
Subject: Re: [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t.
On 2025-02-21 12:10:51 [-0500], Joe Damato wrote:
> On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote:
> > This is a follow-up on
> > https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
> >
> > to convert the page_pool statistics to u64_stats_t to avoid u64 related
> > problems on 32bit architectures.
> > While looking over it, the comment for recycle_stat_inc() says that it
> > is safe to use in preemptible context.
>
> I wrote that comment because it's an increment of a per-cpu counter.
>
> The documentation in Documentation/core-api/this_cpu_ops.rst
> explains in more depth, but this_cpu_inc is safe to use without
> worrying about pre-emption and interrupts.
I don't argue that it is not safe to use in preemptible context. I am
just saying that it is not safe after the rework. If it is really used
like that, then it is no longer safe to do so (after the rework).
> > The 32bit update is split into two 32bit writes and if we get
> > preempted in the middle and another one makes an update then the
> > value gets inconsistent and the previous update can overwrite the
> > following. (Rare but still).
>
> Have you seen this? Can you show the generated assembly which
> suggests that this occurs? It would be helpful if you could show the
> before and after 32-bit assembly code.
…
So there are two things:
First we have alloc_stat_inc(), which does
#define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++)
so on x86 32bit this turns into
| addl $1, 120(%ebx) #, pool_15(D)->alloc_stats.fast
| adcl $0, 124(%ebx) #, pool_15(D)->alloc_stats.fast
So the lower 4 byte are incremented before the higher 4 byte are.
recycle_stat_inc() is using this_cpu_inc() and performs a similar
update. On x86-32 it turns into
| movl 836(%ebx), %eax # pool_15(D)->recycle_stats, s
| pushf ; pop %edx # flags
| cli
| movl %fs:this_cpu_off, %ecx # *_20,
| addl %ecx, %eax #, _42
| addl $1, (%eax) #, *_42
| adcl $0, 4(%eax) #, *_42
| testb $2, %dh #, flags
| je .L508 #,
| sti
|.L508:
so the update can be performed safely in preemptible context as the CPU
is determined within the IRQ-off section and so is the increment itself
performed. It updates always the local value belonging to the CPU.
Reading the values locally (on the CPU that is doing the update) is okay
but reading the value from a remote CPU while an update might be done
can result in reading the lower 4 bytes before the upper 4 bytes are
visible.
This can lead to an inconsistent value on 32bit which is likely
"corrected" on the next read.
Thus my initial question: Do we care? If so, I suggest to use u64_stats
like most of the networking stack. However if we do so, the update must
be performed with disabled-BH as I assume the updates are done in
softirq context. It must be avoided that one update preempts another.
Sebastian
Powered by blists - more mailing lists