linux-kernel - Re: RCU vs NOHZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YTN7mnQSN2eJCysLsZOq+8JEOV6pvgw3LDTT=0mnkC2SA@mail.gmail.com>
Date:   Fri, 16 Sep 2022 14:11:10 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: RCU vs NOHZ

Hi Peter,

On Fri, Sep 16, 2022 at 5:20 AM Peter Zijlstra <peterz@...radead.org> wrote:
[...]
> > It wasn't enabled for ChromeOS.
> >
> > When fully enabled, it gave them the energy-efficiency advantages Joel
> > described.  And then Joel described some additional call_rcu_lazy()
> > changes that provided even better energy efficiency.  Though I believe
> > that the application should also be changed to avoid incessantly opening
> > and closing that file while the device is idle, as this would remove
> > -all- RCU work when nearly idle.  But some of the other call_rcu_lazy()
> > use cases would likely remain.
>
> So I'm thinking the scheme I outlined gets you most if not all of what
> lazy would get you without having to add the lazy thing. A CPU is never
> refused deep idle when it passes off the callbacks.
>
> The NOHZ thing is a nice hook for 'this-cpu-wants-to-go-idle-long-term'
> and do our utmost bestest to move work away from it. You *want* to break
> affinity at this point.
>
> If you hate on the global, push it to a per rcu_node offload list until
> the whole node is idle and then push it up the next rcu_node level until
> you reach the top.
>
> Then when the top rcu_node is full idle; you can insta progress the QS
> state and run the callbacks and go idle.

In my opinion the speed brakes have to be applied before the GP and
other threads are even awakened. The issue Android and ChromeOS
observe is that even a single CB queued every few jiffies can cause
work that can be otherwise delayed / batched, to be scheduled in. I am
not sure if your suggestions above address that. Does it?

Try this experiment on your ADL system (for fun). Boot to the login
screen on any distro, and before logging in, run turbostat over ssh
and observe PC8 percent residencies. Now increase
jiffies_till_first_fqs boot parameter value to 64 or so and try again.
You may be surprised how much PC8 percent increases by delaying RCU
and batching callbacks (via jiffies boot option) Admittedly this is
more amplified on ADL because of package-C-states, firmware and what
not, and isn’t as much a problem on Android; but still gives a nice
power improvement there.

thanks,

 - Joel