[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251025122659.GA2352457@noisy.programming.kicks-ass.net>
Date: Sat, 25 Oct 2025 14:26:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Xie Yuanbin <qq570070308@...il.com>
Cc: linux@...linux.org.uk, mathieu.desnoyers@...icios.com,
	paulmck@...nel.org, pjw@...nel.org, palmer@...belt.com,
	aou@...s.berkeley.edu, alex@...ti.fr, hca@...ux.ibm.com,
	gor@...ux.ibm.com, agordeev@...ux.ibm.com,
	borntraeger@...ux.ibm.com, svens@...ux.ibm.com, davem@...emloft.net,
	andreas@...sler.com, tglx@...utronix.de, mingo@...hat.com,
	bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
	luto@...nel.org, acme@...nel.org, namhyung@...nel.org,
	mark.rutland@....com, alexander.shishkin@...ux.intel.com,
	jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com,
	anna-maria@...utronix.de, frederic@...nel.org,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, thuth@...hat.com,
	riel@...riel.com, akpm@...ux-foundation.org, david@...hat.com,
	lorenzo.stoakes@...cle.com, segher@...nel.crashing.org,
	ryan.roberts@....com, max.kellermann@...os.com, urezki@...il.com,
	nysal@...ux.ibm.com, x86@...nel.org,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
	sparclinux@...r.kernel.org, linux-perf-users@...r.kernel.org,
	will@...nel.org
Subject: Re: [PATCH 0/3] Optimize code generation during context switching
On Sat, Oct 25, 2025 at 02:26:25AM +0800, Xie Yuanbin wrote:
> The purpose of this series of patches is to optimize the performance of
> context switching. It does not change the code logic, but only modifies
> the inline attributes of some functions.
> 
> The original reason for writing this patch is that, when debugging a
> schedule performance problem, I discovered that the finish_task_switch
> function was not inlined, even in the O2 level optimization. This may
> affect performance for the following reasons:
Not sure what compiler you're running, but it is on the one random
compile I just checked.
> 1. It is in the context switching code, and is called frequently.
> 2. Because of the modern CPU mitigations for vulnerabilities, inside
> switch_mm, the instruction pipeline and cache may be cleared, and the
> branch and cache miss may increase. finish_task_switch is right after
> that, so this may cause greater performance degradation.
That patch really is one of the ugliest things I've seen in a while; and
you have no performance numbers included or any other justification for
any of this ugly.
> 3. The __schedule function has __sched attribute, which makes it be
> placed in the ".sched.text" section, while finish_task_switch does not,
> which causes their distance to be very far in binary, aggravating the
> above performance degradation.
How? If it doesn't get inlined it will be a direct call, in which case
the prefetcher should have no trouble.
Powered by blists - more mailing lists
 
