lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0i5-8eO6T_-Sr-K=3Up89+_qtJW7NSjDknJSkk3Nhu8BQ@mail.gmail.com>
Date: Wed, 29 Oct 2025 21:29:58 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, linux-kernel@...r.kernel.org, 
	linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-pm@...r.kernel.org, bpf@...r.kernel.org, arnd@...db.de, 
	catalin.marinas@....com, will@...nel.org, peterz@...radead.org, 
	akpm@...ux-foundation.org, mark.rutland@....com, harisokn@...zon.com, 
	cl@...two.org, ast@...nel.org, daniel.lezcano@...aro.org, memxor@...il.com, 
	zhenglifeng1@...wei.com, xueshuai@...ux.alibaba.com, 
	joao.m.martins@...cle.com, boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout()

On Wed, Oct 29, 2025 at 8:13 PM Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
>
> Rafael J. Wysocki <rafael@...nel.org> writes:
>
> > On Wed, Oct 29, 2025 at 5:42 AM Ankur Arora <ankur.a.arora@...cle.com> wrote:
> >>
> >>
> >> Rafael J. Wysocki <rafael@...nel.org> writes:
> >>
> >> > On Tue, Oct 28, 2025 at 6:32 AM Ankur Arora <ankur.a.arora@...cle.com> wrote:
> >> >>
> >> >> The inner loop in poll_idle() polls over the thread_info flags,
> >> >> waiting to see if the thread has TIF_NEED_RESCHED set. The loop
> >> >> exits once the condition is met, or if the poll time limit has
> >> >> been exceeded.
> >> >>
> >> >> To minimize the number of instructions executed in each iteration,
> >> >> the time check is done only intermittently (once every
> >> >> POLL_IDLE_RELAX_COUNT iterations). In addition, each loop iteration
> >> >> executes cpu_relax() which on certain platforms provides a hint to
> >> >> the pipeline that the loop busy-waits, allowing the processor to
> >> >> reduce power consumption.
> >> >>
> >> >> This is close to what smp_cond_load_relaxed_timeout() provides. So,
> >> >> restructure the loop and fold the loop condition and the timeout check
> >> >> in smp_cond_load_relaxed_timeout().
> >> >
> >> > Well, it is close, but is it close enough?
> >>
> >> I guess that's the question.
> >>
> >> >> Cc: "Rafael J. Wysocki" <rafael@...nel.org>
> >> >> Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
> >> >> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> >> >> ---
> >> >>  drivers/cpuidle/poll_state.c | 29 ++++++++---------------------
> >> >>  1 file changed, 8 insertions(+), 21 deletions(-)
> >> >>
> >> >> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> >> >> index 9b6d90a72601..dc7f4b424fec 100644
> >> >> --- a/drivers/cpuidle/poll_state.c
> >> >> +++ b/drivers/cpuidle/poll_state.c
> >> >> @@ -8,35 +8,22 @@
> >> >>  #include <linux/sched/clock.h>
> >> >>  #include <linux/sched/idle.h>
> >> >>
> >> >> -#define POLL_IDLE_RELAX_COUNT  200
> >> >> -
> >> >>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
> >> >>                                struct cpuidle_driver *drv, int index)
> >> >>  {
> >> >> -       u64 time_start;
> >> >> -
> >> >> -       time_start = local_clock_noinstr();
> >> >> +       u64 time_end;
> >> >> +       u32 flags = 0;
> >> >>
> >> >>         dev->poll_time_limit = false;
> >> >>
> >> >> +       time_end = local_clock_noinstr() + cpuidle_poll_time(drv, dev);
> >> >
> >> > Is there any particular reason for doing this unconditionally?  If
> >> > not, then it looks like an arbitrary unrelated change to me.
> >>
> >> Agreed. Will fix.
> >>
> >> >> +
> >> >>         raw_local_irq_enable();
> >> >>         if (!current_set_polling_and_test()) {
> >> >> -               unsigned int loop_count = 0;
> >> >> -               u64 limit;
> >> >> -
> >> >> -               limit = cpuidle_poll_time(drv, dev);
> >> >> -
> >> >> -               while (!need_resched()) {
> >> >> -                       cpu_relax();
> >> >> -                       if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> >> >> -                               continue;
> >> >> -
> >> >> -                       loop_count = 0;
> >> >> -                       if (local_clock_noinstr() - time_start > limit) {
> >> >> -                               dev->poll_time_limit = true;
> >> >> -                               break;
> >> >> -                       }
> >> >> -               }
> >> >> +               flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
> >> >> +                                                     (VAL & _TIF_NEED_RESCHED),
> >> >> +                                                     (local_clock_noinstr() >= time_end));
> >> >
> >> > So my understanding of this is that it reduces duplication with some
> >> > other places doing similar things.  Fair enough.
> >> >
> >> > However, since there is "timeout" in the name, I'd expect it to take
> >> > the timeout as an argument.
> >>
> >> The early versions did have a timeout but that complicated the
> >> implementation significantly. And the current users poll_idle(),
> >> rqspinlock don't need a precise timeout.
> >>
> >> smp_cond_load_relaxed_timed(), smp_cond_load_relaxed_timecheck()?
> >>
> >> The problem with all suffixes I can think of is that it makes the
> >> interface itself nonobvious.
> >>
> >> Possibly something with the sense of bail out might work.
> >
> > It basically has two conditions, one of which is checked in every step
> > of the internal loop and the other one is checked every
> > SMP_TIMEOUT_POLL_COUNT steps of it.  That isn't particularly
> > straightforward IMV.
>
> Right. And that's similar to what poll_idle().

My point is that the macro in its current form is not particularly
straightforward.

The code in poll_idle() does what it needs to do.

> > Honestly, I prefer the existing code.  It is much easier to follow and
> > I don't see why the new code would be better.  Sorry.
>
> I don't think there's any problem with the current code. However, I'd like
> to add support for poll_idle() on arm64 (and maybe other platforms) where
> instead of spinning in a cpu_relax() loop, you wait on a cacheline.

Well, there is MWAIT on x86, but it is not used here.  It just takes
too much time to wake up from.  There are "fast" variants of that too,
but they have been designed with user space in mind, so somewhat
cumbersome for kernel use.

> And that's what using something like smp_cond_load_relaxed_timeout()
> would enable.
>
> Something like the series here:
>   https://lore.kernel.org/lkml/87wmaljd81.fsf@oracle.com/
>
> (Sorry, should have mentioned this in the commit message.)

I'm not sure how you can combine that with a proper timeout.  The
timeout is needed because you want to break out of this when it starts
to take too much time, so you can go back to the idle loop and maybe
select a better idle state.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ