[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ2PR11MB7670E05E066CCC16AFEA16A18DAC2@SJ2PR11MB7670.namprd11.prod.outlook.com>
Date: Tue, 1 Apr 2025 15:03:28 +0000
From: "King, Colin" <colin.king@...el.com>
To: Christian Loehle <christian.loehle@....com>, Bart Van Assche
<bvanassche@....org>, Jens Axboe <axboe@...nel.dk>, "Rafael J. Wysocki"
<rafael@...nel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] cpuidle: psd: add power sleep demotion prevention for
fast I/O devices
Hi,
Reply at end..
> -----Original Message-----
> From: Christian Loehle <christian.loehle@....com>
> Sent: 26 March 2025 16:27
> To: King, Colin <colin.king@...el.com>; Bart Van Assche
> <bvanassche@....org>; Jens Axboe <axboe@...nel.dk>; Rafael J. Wysocki
> <rafael@...nel.org>; Daniel Lezcano <daniel.lezcano@...aro.org>; linux-
> block@...r.kernel.org; linux-pm@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion prevention for
> fast I/O devices
>
> On 3/26/25 15:04, King, Colin wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Bart Van Assche <bvanassche@....org>
> >> Sent: 23 March 2025 12:36
> >> To: King, Colin <colin.king@...el.com>; Christian Loehle
> >> <christian.loehle@....com>; Jens Axboe <axboe@...nel.dk>; Rafael J.
> >> Wysocki <rafael@...nel.org>; Daniel Lezcano
> >> <daniel.lezcano@...aro.org>; linux-block@...r.kernel.org;
> >> linux-pm@...r.kernel.org
> >> Cc: linux-kernel@...r.kernel.org
> >> Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion
> >> prevention for fast I/O devices
> >>
> >> On 3/17/25 3:03 AM, King, Colin wrote:
> >>> This code is optional, one can enable it or disable it via the
> >>> config option. Also, even when it is built-in one can disable it by
> >>> writing 0 to the
> >> sysfs file
> >>> /sys/devices/system/cpu/cpuidle/psd_cpu_lat_timeout_ms
> >>
> >> I'm not sure we need even more configuration knobs in sysfs.
> >
> > It's useful for enabling / disabling the functionality, as well as some form of
> tuning for slower I/O devices, so I think it is justifiable.
> >
> >> How are users
> >> expected to find this configuration option? How should they decide
> >> whether to enable or to disable it?
> >
> > I can send a V2 with some documentation if that's required.
> >
> >>
> >> Please take a look at this proposal and let me know whether this
> >> would solve the issue that you are looking into: "[LSF/MM/BPF Topic]
> Energy- Efficient I/O"
> >> (https://lore.kernel.org/linux-block/ad1018b6-7c0b-4d70-
> >> b845-c869287d3cf3@....org/). The only disadvantage of this approach
> >> compared to the cpuidle patch is that it requires RPM (runtime power
> >> management) to be enabled. Maybe I should look into modifying the
> >> approach such that it does not rely on RPM.
> >
> > I've had a look, the scope of my patch is a bit wider. If my patch
> > gets accepted I'm going to also look at putting the psd call into
> > other devices (such as network devices) to also stop deep states while
> > these devices are busy. Since the code is very lightweight I was hoping this
> was going to be relatively easy and simple to use in various devices in the
> future.
>
> IMO this needs to be a lot more fine-grained then, both in terms of which
> devices or even IO is affected (Surely some IO is fine with at least *some*
> latency) but also how aggressive we are in blocking.
> Just looking at some common latency/residency of idle states out there I don't
> think it's reasonable to force polling for a 3-10ms (rounding up with the jiffie)
> period.
The current solution by a customer is that they are resorting to disabling C6/C6P and hence
all the CPUs are essentially in a non-low power state all the time. The opt-in solution
provided in the patch provides nearly the same performance and will re-enable deeper
C-states once the I/O is completed.
As I mentioned earlier, the jiffies are used because it's low-touch and very fast with negligible
impact on the I/O paths. Using finer grained timing is far more an expensive operation and
is a huge overhead on very fast I/O devices.
Also, this is a user config and tune-able choice. Users can opt-in to using this if they want
to pay for the extra CPU overhead for a bit more I/O performance. If they don't want it, they
don't need to enable it.
> Playing devil's advocate if the system is under some thermal/power pressure
> we might actually reduce throughput by burning so much power on this.
> This seems like the stuff that is easily convincing because it improves
> throughput and then taking care of power afterwards is really hard. :/
>
The current solution is when the user is trying to get maximum bandwidth and disabling C6/C6P
so they are already keeping the system busy. This solution at least will save power when I/O is idling.
Colin
Powered by blists - more mailing lists