linux-kernel - Re: [PATCH] perf: Optimize perf_pmu_migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db6f2ad0-fe47-4574-9290-a1be5f349368@paulmck-laptop>
Date:   Mon, 3 Apr 2023 15:51:27 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH] perf: Optimize perf_pmu_migrate_context()

On Tue, Apr 04, 2023 at 12:07:30AM +0200, Thomas Gleixner wrote:
> On Mon, Apr 03 2023 at 11:08, Peter Zijlstra wrote:
> > Thomas reported that offlining CPUs spends a lot of time in
> > synchronize_rcu() as called from perf_pmu_migrate_context() even though
> > he's not actually using uncore events.
> 
> That happens when offlining CPUs from a socket > 0 in the same order how
> those CPUs have been brought up. On socket 0 this is not observable
> unless the bogus CPU0 offlining hack is enabled.
> 
> If the offlining happens in the reverse order then all is shiny.
> 
> The reason is that the first online CPU on a socket gets the uncore
> events assigned and when it is offlined then those are moved to the next
> online CPU in the same socket.
> 
> On a SKL-X with 56 threads per sockets this results in a whopping _1_
> second delay per thread (except for the last one which shuts down the
> per socket uncore events with no delay because there are no users) due
> to 62 times of pointless synchronize_rcu() invocations where each takes
> ~16ms on a HZ=250 kernel.
> 
> Which in turn is interesting because that machine is completely idle
> other than running the offline muck...
> 
> > Turns out, the thing is unconditionally waiting for RCU, even if there's
> > no actual events to migrate.
> >
> > Fixes: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()")
> > Reported-by: Thomas Gleixner <tglx@...utronix.de>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > Tested-by: Thomas Gleixner <tglx@...utronix.de>
> 
> Reviewed-by: Thomas Gleixner <tglx@...utronix.de>

Yow!  ;-)

Assuming that all the events run under RCU protection, as in preemption
disabled:

Reviewed-by: Paul E. McKenney <paulmck@...nel.org>