linux-kernel - Re: RCU vs NOHZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82a58d5c-73e7-6e78-e72e-3e46a1a3afbc@joelfernandes.org>
Date:   Sat, 17 Sep 2022 09:52:49 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: RCU vs NOHZ



On 9/17/2022 9:35 AM, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 02:11:10PM -0400, Joel Fernandes wrote:
>> Hi Peter,
>>
>> On Fri, Sep 16, 2022 at 5:20 AM Peter Zijlstra <peterz@...radead.org> wrote:
>> [...]
>>>> It wasn't enabled for ChromeOS.
>>>>
>>>> When fully enabled, it gave them the energy-efficiency advantages Joel
>>>> described.  And then Joel described some additional call_rcu_lazy()
>>>> changes that provided even better energy efficiency.  Though I believe
>>>> that the application should also be changed to avoid incessantly opening
>>>> and closing that file while the device is idle, as this would remove
>>>> -all- RCU work when nearly idle.  But some of the other call_rcu_lazy()
>>>> use cases would likely remain.
>>>
>>> So I'm thinking the scheme I outlined gets you most if not all of what
>>> lazy would get you without having to add the lazy thing. A CPU is never
>>> refused deep idle when it passes off the callbacks.
>>>
>>> The NOHZ thing is a nice hook for 'this-cpu-wants-to-go-idle-long-term'
>>> and do our utmost bestest to move work away from it. You *want* to break
>>> affinity at this point.
>>>
>>> If you hate on the global, push it to a per rcu_node offload list until
>>> the whole node is idle and then push it up the next rcu_node level until
>>> you reach the top.
>>>
>>> Then when the top rcu_node is full idle; you can insta progress the QS
>>> state and run the callbacks and go idle.
>>
>> In my opinion the speed brakes have to be applied before the GP and
>> other threads are even awakened. The issue Android and ChromeOS
>> observe is that even a single CB queued every few jiffies can cause
>> work that can be otherwise delayed / batched, to be scheduled in. I am
>> not sure if your suggestions above address that. Does it?
> 
> Scheduled how? Is this callbacks doing queue_work() or something?

Way before the callback is even ready to execute, you can rcuog, rcuop,
rcu_preempt threads running to go through the grace period state machine.

> Anyway; the thinking is that by passing off the callbacks on NOHZ, the
> idle CPUs stay idle. By running the callbacks before going full idle,
> all work is done and you can stay idle longer.

But all CPUs idle does not mean grace period is over, you can have a task (at
least on PREEMPT_RT) block in the middle of an RCU read-side critical section
and then all CPUs go idle.

Other than that, a typical flow could look like:

1. CPU queues a callback.
2. CPU then goes idle.
3. Another CPU is running the RCU threads waking up otherwise idle CPUs.
4. Grace period completes and an RCU thread runs a callback.

>> Try this experiment on your ADL system (for fun). Boot to the login
>> screen on any distro,
> 
> All my dev boxes are headless :-) I don't thinkt he ADL even has X or
> wayland installed.

Ah, ok. Maybe what you have (like daemons) are already requesting RCU for
something. Android folks had some logger requesting RCU all the time.

>> and before logging in, run turbostat over ssh
>> and observe PC8 percent residencies. Now increase
>> jiffies_till_first_fqs boot parameter value to 64 or so and try again.
>> You may be surprised how much PC8 percent increases by delaying RCU
>> and batching callbacks (via jiffies boot option) Admittedly this is
>> more amplified on ADL because of package-C-states, firmware and what
>> not, and isn’t as much a problem on Android; but still gives a nice
>> power improvement there.
> 
> I can try; but as of now turbostat doesn't seem to work on that thing at
> all. I think localyesconfig might've stripped a required bit. I'll poke
> at it later.

Cool! I believe Len Brown can help on that , or maybe there is another way you
can read the counters to figure out the PC8% and RAPL power.

thanks,

 - Joel