[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82a58d5c-73e7-6e78-e72e-3e46a1a3afbc@joelfernandes.org>
Date: Sat, 17 Sep 2022 09:52:49 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Boqun Feng <boqun.feng@...il.com>,
Frederic Weisbecker <fweisbec@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: RCU vs NOHZ
On 9/17/2022 9:35 AM, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 02:11:10PM -0400, Joel Fernandes wrote:
>> Hi Peter,
>>
>> On Fri, Sep 16, 2022 at 5:20 AM Peter Zijlstra <peterz@...radead.org> wrote:
>> [...]
>>>> It wasn't enabled for ChromeOS.
>>>>
>>>> When fully enabled, it gave them the energy-efficiency advantages Joel
>>>> described. And then Joel described some additional call_rcu_lazy()
>>>> changes that provided even better energy efficiency. Though I believe
>>>> that the application should also be changed to avoid incessantly opening
>>>> and closing that file while the device is idle, as this would remove
>>>> -all- RCU work when nearly idle. But some of the other call_rcu_lazy()
>>>> use cases would likely remain.
>>>
>>> So I'm thinking the scheme I outlined gets you most if not all of what
>>> lazy would get you without having to add the lazy thing. A CPU is never
>>> refused deep idle when it passes off the callbacks.
>>>
>>> The NOHZ thing is a nice hook for 'this-cpu-wants-to-go-idle-long-term'
>>> and do our utmost bestest to move work away from it. You *want* to break
>>> affinity at this point.
>>>
>>> If you hate on the global, push it to a per rcu_node offload list until
>>> the whole node is idle and then push it up the next rcu_node level until
>>> you reach the top.
>>>
>>> Then when the top rcu_node is full idle; you can insta progress the QS
>>> state and run the callbacks and go idle.
>>
>> In my opinion the speed brakes have to be applied before the GP and
>> other threads are even awakened. The issue Android and ChromeOS
>> observe is that even a single CB queued every few jiffies can cause
>> work that can be otherwise delayed / batched, to be scheduled in. I am
>> not sure if your suggestions above address that. Does it?
>
> Scheduled how? Is this callbacks doing queue_work() or something?
Way before the callback is even ready to execute, you can rcuog, rcuop,
rcu_preempt threads running to go through the grace period state machine.
> Anyway; the thinking is that by passing off the callbacks on NOHZ, the
> idle CPUs stay idle. By running the callbacks before going full idle,
> all work is done and you can stay idle longer.
But all CPUs idle does not mean grace period is over, you can have a task (at
least on PREEMPT_RT) block in the middle of an RCU read-side critical section
and then all CPUs go idle.
Other than that, a typical flow could look like:
1. CPU queues a callback.
2. CPU then goes idle.
3. Another CPU is running the RCU threads waking up otherwise idle CPUs.
4. Grace period completes and an RCU thread runs a callback.
>> Try this experiment on your ADL system (for fun). Boot to the login
>> screen on any distro,
>
> All my dev boxes are headless :-) I don't thinkt he ADL even has X or
> wayland installed.
Ah, ok. Maybe what you have (like daemons) are already requesting RCU for
something. Android folks had some logger requesting RCU all the time.
>> and before logging in, run turbostat over ssh
>> and observe PC8 percent residencies. Now increase
>> jiffies_till_first_fqs boot parameter value to 64 or so and try again.
>> You may be surprised how much PC8 percent increases by delaying RCU
>> and batching callbacks (via jiffies boot option) Admittedly this is
>> more amplified on ADL because of package-C-states, firmware and what
>> not, and isn’t as much a problem on Android; but still gives a nice
>> power improvement there.
>
> I can try; but as of now turbostat doesn't seem to work on that thing at
> all. I think localyesconfig might've stripped a required bit. I'll poke
> at it later.
Cool! I believe Len Brown can help on that , or maybe there is another way you
can read the counters to figure out the PC8% and RAPL power.
thanks,
- Joel
Powered by blists - more mailing lists