[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZUL1qEOV8QBHVFHP@boqun-archlinux>
Date: Wed, 1 Nov 2023 18:04:40 -0700
From: Boqun Feng <boqun.feng@...il.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: "Paul E . McKenney" <paulmck@...nel.org>,
RCU <rcu@...r.kernel.org>,
Neeraj upadhyay <Neeraj.Upadhyay@....com>,
Hillf Danton <hdanton@...a.com>,
Joel Fernandes <joel@...lfernandes.org>,
LKML <linux-kernel@...r.kernel.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH 1/3] rcu: Reduce synchronize_rcu() waiting time
On Wed, Nov 01, 2023 at 04:35:05PM +0100, Uladzislau Rezki wrote:
[...]
> > Basically it does not work, because you do not fix the mixing "issue".
> > I have been working on it and we agreed to separate it. Because it is
> > just makes sense. The reason and the problem i see, i described in the
> > commit message of v2.
As I understand it, your point is "if we want synchronize_rcu() faster
in all the cases, then a separate list makes a lot of sense since it
won't be affected by different configs and etc.", right? If so, then
understood.
I don't have any problem on that your patch does a good work on making
synchronize_rcu() faster, and actually I don't think my proposal
necessarily blocks your patch. However, I wonder: why synchronize_rcu()
is more special than other call_rcu_hurry() cases? Sure,
synchronize_rcu() means blocking and unblocking ealier is always
desirable, but call_rcu_hurry() could also queue a callback that wake
up other thread, right? By making synchronize_rcu() faster, do we end up
making other call_rcu_hurry() slow? So in my proposal, as you can see,
I tried to be fair among all call_rcu_hurry() users, and hope that's
a better solution from the whole kernel viewpoint.
Also another fear I have is the following story:
(Let say your improvement gets merged. And 6 months later)
Someone shows up and say "hi, the synchronize_rcu() latency reduce work
is great, but we have 1024 CPUs, so the single list in sr_normal_state
becomes a bottleneck, synchronize_rcu() on some CPUs gets delayed by
other CPU's synchronize_rcu(), can we make the list percpu?"
(And 6 months later)
Someone shows up and say "hi, the percpu list for low latency
synchronize_rcu() is great, however, we want to save some CPU power by
putting CPUs into groups and naming one leader of each group to handle
synchronize_rcu() wakeups for the whole group, let's use kconfig
CONFIG_RCU_NOSR_CPU to control that feature"
Well, it's a story, so it may not happen, but I don't think the
possiblity of totally re-inventing RCU callback lists and NOCB is 0 ;-)
Anyway, I should stop being annoying here, I will use your test steps to
check my idea, and will let you know!
> >
> > >
> > > Do you have a benchmark I can try out to see if my diff can achieve the
> > > similar result? Thanks!
> > >
> > There is no a good benchmark. But you can write it for sure. I tested
> > three scenarios:
> >
> > - Run a camera app on our Android devices. Measuring app launch in
> > milliseconds;
> > - Doing synchronize_rcu() and kfree(ptr) simultaneously by 10K/etc
> > workers. It is important test case because we have a fallback to
> > this scenario for our kvfree_rcu_mightslepp() API.
> > - I had a look at time delta of loading 100 kernel modules.
> >
> > That were my main test cases.
> >
> I will provide the patches and test steps, so you can try on.
> Tomorrow i will send it!
>
Thanks!
Regards,
Boqun
> --
> Uladzislau Rezki
Powered by blists - more mailing lists