linux-kernel - Re: [PATCH 0/2] measure latency of cpu hotplug path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200928074020.GB2611@hirez.programming.kicks-ass.net>
Date:   Mon, 28 Sep 2020 09:40:20 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     psodagud@...eaurora.org
Cc:     Steven Rostedt <rostedt@...dmis.org>, tglx@...utronix.de,
        qais.yousef@....com, mingo@...nel.org, cai@....pw,
        tyhicks@...onical.com, arnd@...db.de, rameezmustafa@...eaurora.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] measure latency of cpu hotplug path

On Sun, Sep 27, 2020 at 07:41:45PM -0700, psodagud@...eaurora.org wrote:
> On 2020-09-24 07:58, Steven Rostedt wrote:
> > On Thu, 24 Sep 2020 10:34:14 +0200
> > peterz@...radead.org wrote:
> > 
> > > On Wed, Sep 23, 2020 at 04:37:44PM -0700, Prasad Sodagudi wrote:
> > > > There are all changes related to cpu hotplug path and would like to seek
> > > > upstream review. These are all patches in Qualcomm downstream kernel
> > > > for a quite long time. First patch sets the rt prioity to hotplug
> > > > task and second patch adds cpuhp trace events.
> > > >
> > > > 1) cpu-hotplug: Always use real time scheduling when hotplugging a CPU
> > > > 2) cpu/hotplug: Add cpuhp_latency trace event
> > > 
> > > Why? Hotplug is a known super slow path. If you care about hotplug
> > > latency you're doing it wrong.
> Hi Peter,
> 
> [PATCH 1/2] cpu/hotplug: Add cpuhp_latency trace event -
> 1)	Tracing of the cpuhp operation is important to find whether upstream
> changes or out of tree modules(or firmware changes) caused latency
> regression or not.

This is a contradiction in terms, it is impossible to have a latency
regression is you don't care about the latency in this super slow path
to begin with.

> 2)	Secondary cpus are hotplug out during the device suspend and hotplug in
> during the resume.

Indeed they are.

> 3)	firmware(psci calls handling from firmware) changes impact need to be
> tested right?

Firmware is firmware, it's broken by design and we can't fix it if it's
broken. The only sane solution is not having firmware :-)

> 4)	cpu hotplug framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks may impact
> the hotplug latency.

Again, nobody cares.

> [PATCH 2/2] cpu-hotplug: Always use real time scheduling when  hotplugging a
> CPU –
> 
> CPU hotplug operation is stressed and while stress testing with full load on
> the system following problem is observed.
> CPU hotplug operations take place in preemptible context. This leaves the
> hotplugging thread at the mercy of overall system load and CPU
> availability. If the hotplugging thread does not get an opportunity to
> execute after it has already begun a hotplug operation, CPUs can
> end up being stuck in a quasi online state. In the worst case a CPU can be
> stuck in a state where the migration thread is parked while
> another task is executing and changing affinity in a loop. This combination
> can result in unbounded execution time for the running
> task until the hot plugging thread gets the chance to run to complete the
> hotplug operation.

How is that not an administration problem?

Also, you shouldn't be able to change your affinity _to_ a CPU that's
going down. One of the very first steps in hotplug ensures that.