[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251113103512.18e7bb03@gandalf.local.home>
Date: Thu, 13 Nov 2025 10:35:12 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Yongliang Gao <leonylgao@...il.com>, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, frankjpliu@...cent.com, Yongliang Gao
<leonylgao@...cent.com>, Huang Cun <cunhuang@...cent.com>
Subject: Re: [PATCH v3] trace/pid_list: optimize pid_list->lock contention
On Thu, 13 Nov 2025 08:34:20 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> > + do {
> > + seq = read_seqcount_begin(&pid_list->seqcount);
> > + ret = false;
> > + upper_chunk = pid_list->upper[upper1];
> > + if (upper_chunk) {
> > + lower_chunk = upper_chunk->data[upper2];
> > + if (lower_chunk)
> > + ret = test_bit(lower, lower_chunk->data);
> > + }
> > + } while (read_seqcount_retry(&pid_list->seqcount, seq));
>
> How is this better? Any numbers?
> If the write side is busy and the lock is handed over from one CPU to
> another then it is possible that the reader spins here and does several
> loops, right?
I think the chances of that is very slim. The writes are at fork and exit
and manually writing to one of the set_*_pid files.
The readers are at every sched_switch. Currently we just use
raw_spin_locks. But that forces a serialization of every sched_switch!
Which on big machines could cause a huge latency.
This approach allows multiple sched_switches to happen at the same time.
> And in this case, how accurate would it be? I mean the result could
> change right after the sequence here is completed because the write side
> got active again. How bad would it be if there would be no locking and
> RCU ensures that the chunks (and data) don't disappear while looking at
> it?
As I mentioned the use case for this, it is very accurate. That's because
the writers are updating the pid bits for themselves. If you are checking
for pid 123, that means task 123 is about to run. If bit 123 is being added
or removed, it would only be done by task 123 or its parent.
The exception to this rule is if a user manually adds or removes a pid from
the set_*_pid file. But that has other races that we don't really care
about. It's known that the update made there may take some milliseconds to
update.
-- Steve
Powered by blists - more mailing lists