[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260204121145.3951995-1-realwujing@gmail.com>
Date: Wed, 4 Feb 2026 07:11:41 -0500
From: Qiliang Yuan <realwujing@...il.com>
To: vincent.guittot@...aro.org,
christian.loehle@....com
Cc: bsegall@...gle.com,
dietmar.eggemann@....com,
juri.lelli@...hat.com,
linux-kernel@...r.kernel.org,
mgorman@...e.de,
mingo@...hat.com,
peterz@...radead.org,
realwujing@...il.com,
rostedt@...dmis.org,
vschneid@...hat.com,
yuanql9@...natelecom.cn
Subject: Re: [PATCH v2 RSEND] sched/fair: Optimize EAS energy calculation complexity from O(N) to O(1) inside inner loop
Hi Christian, Vincent,
Thank you for the detailed feedback.
On Mon, Feb 02, 2026 at 10:48:04AM +0000, Christian Loehle wrote:
> Which is still O(n), I think the title is misleading.
On Tue, Feb 03, 2026 at 06:16:27PM +0100, Vincent Guittot wrote:
> Ok, but the whole feec() remains O(n)
You are absolutely right. While the per-candidate CPU energy estimation was
optimized, the overall complexity of find_energy_efficient_cpu() remains
O(N). I've renamed the patch in v3 to "Optimize EAS by reducing redundant
performance domain scans" to more accurately reflect the scope of the
improvement.
On Mon, Feb 02, 2026 at 10:48:04AM +0000, Christian Loehle wrote:
> I don't think this is actually true. EAS doesn't really work with a large
> number of PDs because of the expensive wakeup path.
> I don't think there's an EAS system out there where this would actually make
> a measurable impact.
On Tue, Feb 03, 2026 at 06:16:27PM +0100, Vincent Guittot wrote:
> Could you add some figures to highlight the statement above ?
In v3, I've further optimized the path by consolidating the 'pd_max_util' and
'pd_busy_time' calculations into the same loop that finds the
'max_spare_cap_cpu'. This reduces the total number of full PD scans from three
down to one per performance domain.
I agree that the impact on current mobile systems with 2-3 PDs might be subtle.
However, as topologies grow and the wake-up path becomes more sensitive to
cache misses, reducing redundant scans of task structures and rq utilization
is a worthwhile constant-factor improvement. I'm investigating synthetic
benchmarks on systems with higher core counts to provide more concrete figures.
I've sent out v3 which includes these further logic consolidations.
v3 link: https://lore.kernel.org/all/20260204120509.3950227-1-realwujing@gmail.com/
Thanks,
Qiliang
Powered by blists - more mailing lists