[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200928074020.GB2611@hirez.programming.kicks-ass.net>
Date: Mon, 28 Sep 2020 09:40:20 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: psodagud@...eaurora.org
Cc: Steven Rostedt <rostedt@...dmis.org>, tglx@...utronix.de,
qais.yousef@....com, mingo@...nel.org, cai@....pw,
tyhicks@...onical.com, arnd@...db.de, rameezmustafa@...eaurora.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] measure latency of cpu hotplug path
On Sun, Sep 27, 2020 at 07:41:45PM -0700, psodagud@...eaurora.org wrote:
> On 2020-09-24 07:58, Steven Rostedt wrote:
> > On Thu, 24 Sep 2020 10:34:14 +0200
> > peterz@...radead.org wrote:
> >
> > > On Wed, Sep 23, 2020 at 04:37:44PM -0700, Prasad Sodagudi wrote:
> > > > There are all changes related to cpu hotplug path and would like to seek
> > > > upstream review. These are all patches in Qualcomm downstream kernel
> > > > for a quite long time. First patch sets the rt prioity to hotplug
> > > > task and second patch adds cpuhp trace events.
> > > >
> > > > 1) cpu-hotplug: Always use real time scheduling when hotplugging a CPU
> > > > 2) cpu/hotplug: Add cpuhp_latency trace event
> > >
> > > Why? Hotplug is a known super slow path. If you care about hotplug
> > > latency you're doing it wrong.
> Hi Peter,
>
> [PATCH 1/2] cpu/hotplug: Add cpuhp_latency trace event -
> 1) Tracing of the cpuhp operation is important to find whether upstream
> changes or out of tree modules(or firmware changes) caused latency
> regression or not.
This is a contradiction in terms, it is impossible to have a latency
regression is you don't care about the latency in this super slow path
to begin with.
> 2) Secondary cpus are hotplug out during the device suspend and hotplug in
> during the resume.
Indeed they are.
> 3) firmware(psci calls handling from firmware) changes impact need to be
> tested right?
Firmware is firmware, it's broken by design and we can't fix it if it's
broken. The only sane solution is not having firmware :-)
> 4) cpu hotplug framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks may impact
> the hotplug latency.
Again, nobody cares.
> [PATCH 2/2] cpu-hotplug: Always use real time scheduling when hotplugging a
> CPU –
>
> CPU hotplug operation is stressed and while stress testing with full load on
> the system following problem is observed.
> CPU hotplug operations take place in preemptible context. This leaves the
> hotplugging thread at the mercy of overall system load and CPU
> availability. If the hotplugging thread does not get an opportunity to
> execute after it has already begun a hotplug operation, CPUs can
> end up being stuck in a quasi online state. In the worst case a CPU can be
> stuck in a state where the migration thread is parked while
> another task is executing and changing affinity in a loop. This combination
> can result in unbounded execution time for the running
> task until the hot plugging thread gets the chance to run to complete the
> hotplug operation.
How is that not an administration problem?
Also, you shouldn't be able to change your affinity _to_ a CPU that's
going down. One of the very first steps in hotplug ensures that.
Powered by blists - more mailing lists