[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC_TJvfe52GRrjHMfvAorC+PtTBRK0hxL9OPoi0SMqDV3XkFZw@mail.gmail.com>
Date: Thu, 11 Jan 2024 08:37:22 -0800
From: Kalesh Singh <kaleshsingh@...gle.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: "Paul E . McKenney" <paulmck@...nel.org>, RCU <rcu@...r.kernel.org>,
Neeraj upadhyay <Neeraj.Upadhyay@....com>, Boqun Feng <boqun.feng@...il.com>,
Hillf Danton <hdanton@...a.com>, Joel Fernandes <joel@...lfernandes.org>,
LKML <linux-kernel@...r.kernel.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>, Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH v4 1/4] rcu: Reduce synchronize_rcu() latency
On Wed, Jan 10, 2024 at 1:22 AM Uladzislau Rezki <urezki@...il.com> wrote:
>
> Hello, Kalesh!
>
> >
> > Hi Uladzislau,
> >
> > I've tried your patches (v3) on Android with 6.1.43 kernel.
> >
> > The test cycles 10 apps (including camera) sequentially for 100
> > iterations.
> >
> > I've set rcu_normal to override the rcu_expedited in the boot
> > parameters:
> >
> > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu
> >
> > rcupdate.rcu_normal=1
> > rcupdate.rcu_expedited=1
> > rcu_nocbs=0-7
> >
> >
> > The configurations are:
> >
> > A - echo 0 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> > B - echo 1 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> >
> > Results:
> >
> > = APP LAUNCH TIME =
> > delta (B-A) ratio(%)
> > overall_app_launch_time(ms) -11399.00 -6.65
> >
> >
> > == camera_launch_time
> > type delta(B-A %) A_count B_count
> > HOT -7.05 99 99
> > COLD -6.33 1 1
> >
> >
Hi Uladzislau,
> If i interpret it correctly you also see that this series reduces
> a launch time by 6/7% on your app set. Is that correct?
Yes your understanding is correct.
>
> > === Function Latencies ===
> >
> > Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit
> >
> > nsec : count distribution nsec : count distribution
> > 0 -> 1 : 0 | | 0 -> 1 : 0 | |
> > 2 -> 3 : 0 | | 2 -> 3 : 0 | |
> > 4 -> 7 : 0 | | 4 -> 7 : 0 | |
> > 8 -> 15 : 0 | | 8 -> 15 : 0 | |
> > 16 -> 31 : 0 | | 16 -> 31 : 0 | |
> > 32 -> 63 : 0 | | 32 -> 63 : 0 | |
> > 64 -> 127 : 0 | | 64 -> 127 : 0 | |
> > 128 -> 255 : 0 | | 128 -> 255 : 0 | |
> > 256 -> 511 : 0 | | 256 -> 511 : 0 | |
> > 512 -> 1023 : 0 | | 512 -> 1023 : 0 | |
> > 1024 -> 2047 : 0 | | 1024 -> 2047 : 0 | |
> > 2048 -> 4095 : 0 | | 2048 -> 4095 : 0 | |
> > 4096 -> 8191 : 0 | | 4096 -> 8191 : 0 | |
> > 8192 -> 16383 : 0 | | 8192 -> 16383 : 0 | |
> > 16384 -> 32767 : 0 | | 16384 -> 32767 : 0 | |
> > 32768 -> 65535 : 0 | | 32768 -> 65535 : 0 | |
> > 65536 -> 131071 : 0 | | 65536 -> 131071 : 0 | |
> > 131072 -> 262143 : 0 | | 131072 -> 262143 : 0 | |
> > 262144 -> 524287 : 0 | | 262144 -> 524287 : 0 | |
> > 524288 -> 1048575 : 0 | | 524288 -> 1048575 : 0 | |
> > 1048576 -> 2097151 : 0 | | 1048576 -> 2097151 : 0 | |
> > 2097152 -> 4194303 : 0 | | 2097152 -> 4194303 : 0 | |
> > 4194304 -> 8388607 : 871 |** | 4194304 -> 8388607 : 1180 |**** |
> > 8388608 -> 16777215 : 3204 |******** | 8388608 -> 16777215 : 7020 |************************* |
> > 16777216 -> 33554431 : 15013 |****************************************| 16777216 -> 33554431 : 10952 |****************************************|
> > Exiting trace of synchronize_rcu_expedited Exiting trace of synchronize_rcu_expedited
> >
> >
> > Tracing synchronize_rcu. Hit Ctrl-C to exit Tracing synchronize_rcu. Hit Ctrl-C to exit
> >
> > nsec : count distribution nsec : count distribution
> > 0 -> 1 : 0 | | 0 -> 1 : 0 | |
> > 2 -> 3 : 0 | | 2 -> 3 : 0 | |
> > 4 -> 7 : 0 | | 4 -> 7 : 0 | |
> > 8 -> 15 : 0 | | 8 -> 15 : 0 | |
> > 16 -> 31 : 0 | | 16 -> 31 : 0 | |
> > 32 -> 63 : 0 | | 32 -> 63 : 0 | |
> > 64 -> 127 : 0 | | 64 -> 127 : 0 | |
> > 128 -> 255 : 0 | | 128 -> 255 : 0 | |
> > 256 -> 511 : 0 | | 256 -> 511 : 0 | |
> > 512 -> 1023 : 0 | | 512 -> 1023 : 0 | |
> > 1024 -> 2047 : 0 | | 1024 -> 2047 : 0 | |
> > 2048 -> 4095 : 0 | | 2048 -> 4095 : 0 | |
> > 4096 -> 8191 : 0 | | 4096 -> 8191 : 0 | |
> > 8192 -> 16383 : 0 | | 8192 -> 16383 : 0 | |
> > 16384 -> 32767 : 0 | | 16384 -> 32767 : 0 | |
> > 32768 -> 65535 : 0 | | 32768 -> 65535 : 0 | |
> > 65536 -> 131071 : 0 | | 65536 -> 131071 : 0 | |
> > 131072 -> 262143 : 0 | | 131072 -> 262143 : 0 | |
> > 262144 -> 524287 : 0 | | 262144 -> 524287 : 0 | |
> > 524288 -> 1048575 : 0 | | 524288 -> 1048575 : 0 | |
> > 1048576 -> 2097151 : 0 | | 1048576 -> 2097151 : 0 | |
> > 2097152 -> 4194303 : 0 | | 2097152 -> 4194303 : 0 | |
> > 4194304 -> 8388607 : 861 |** | 4194304 -> 8388607 : 1136 |**** |
> > 8388608 -> 16777215 : 3111 |******** | 8388608 -> 16777215 : 6320 |************************ |
> > 16777216 -> 33554431 : 13901 |****************************************| 16777216 -> 33554431 : 10484 |****************************************|
> > Exiting trace of synchronize_rcu Exiting trace of synchronize_rcu
> >
> Who is B and who is A?
Left is A (rcu_normal_wake_from_gp=0) and right is B
(rcu_normal_wake_from_gp=1)
>
> >
> > Interestingly I tried the same experiment without rcu_normal=1 (leaving rcu_expedited=1):
> >
> > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu
> > rcupdate.rcu_expedited=1
> > rcu_nocbs=0-7
> >
> > In this case I also saw the -6 to -7% decrease in the app launch times
> > but I don't have a good explanation why that would be? (The fucntion
> > latency histograms in this case didn't show any significant difference).
> > Do you have any insight why this may happen?
> >
> When rcu_expedited=1 is set and rcu_normal=0 is disabled. The
> synchronize_rcu() call is converted into synchronize_rcu_expidited():
>
> <snip>
> void synchronize_rcu(void)
> {
> unsigned long flags;
> struct rcu_node *rnp;
>
> RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
> lock_is_held(&rcu_lock_map) ||
> lock_is_held(&rcu_sched_lock_map),
> "Illegal synchronize_rcu() in RCU read-side critical section");
> if (!rcu_blocking_is_gp()) {
> if (rcu_gp_is_expedited())
> synchronize_rcu_expedited();
> else
> synchronize_rcu_normal();
> return;
> }
> ...
> <snip>
>
> rcu_gp_is_expidited() is true, so invoke "expedited" version.
>
> I see some concerns in preferring an expedited version as a global
> replacement. First of all it is related to latency sensitive workloads
> because in order to expedite a grace period it sends out IPIs on all
> online CPUs to force them to report a quiescent-state asap. I have not
> investigated yet how it affects such workloads.
>
> Therefore, in your case, you also see a performance boost of your app sets.
IIUC the patch shouldn't affect the case? The only difference in A vs
B is rcu_normal_wake_from_gp (both have rcu_expedited=1).
Thanks,
Kalesh
>
> Thank you for looking at it!
>
> --
> Uladzislau Rezki
Powered by blists - more mailing lists