lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0izSBR0_DeH5HVnSLFGRfV9WoSzbu9Mh5yvvuyrvw7fLg@mail.gmail.com>
Date: Tue, 4 Nov 2025 19:07:56 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, linux-kernel@...r.kernel.org, 
	linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-pm@...r.kernel.org, bpf@...r.kernel.org, arnd@...db.de, 
	catalin.marinas@....com, will@...nel.org, peterz@...radead.org, 
	akpm@...ux-foundation.org, mark.rutland@....com, harisokn@...zon.com, 
	cl@...two.org, ast@...nel.org, daniel.lezcano@...aro.org, memxor@...il.com, 
	zhenglifeng1@...wei.com, xueshuai@...ux.alibaba.com, 
	joao.m.martins@...cle.com, boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout()

On Wed, Oct 29, 2025 at 10:01 PM Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
>
> Rafael J. Wysocki <rafael@...nel.org> writes:
>
> > On Wed, Oct 29, 2025 at 8:13 PM Ankur Arora <ankur.a.arora@...cle.com> wrote:
> >>
> >>
> >> Rafael J. Wysocki <rafael@...nel.org> writes:
> >>
> >> > On Wed, Oct 29, 2025 at 5:42 AM Ankur Arora <ankur.a.arora@...cle.com> wrote:
> >> >>
> >> >>
> >> >> Rafael J. Wysocki <rafael@...nel.org> writes:
> >> >>
> >> >> > On Tue, Oct 28, 2025 at 6:32 AM Ankur Arora <ankur.a.arora@...cle.com> wrote:
> >> >> >>
> >> >> >> The inner loop in poll_idle() polls over the thread_info flags,
> >> >> >> waiting to see if the thread has TIF_NEED_RESCHED set. The loop
> >> >> >> exits once the condition is met, or if the poll time limit has
> >> >> >> been exceeded.
> >> >> >>
> >> >> >> To minimize the number of instructions executed in each iteration,
> >> >> >> the time check is done only intermittently (once every
> >> >> >> POLL_IDLE_RELAX_COUNT iterations). In addition, each loop iteration
> >> >> >> executes cpu_relax() which on certain platforms provides a hint to
> >> >> >> the pipeline that the loop busy-waits, allowing the processor to
> >> >> >> reduce power consumption.
> >> >> >>
> >> >> >> This is close to what smp_cond_load_relaxed_timeout() provides. So,
> >> >> >> restructure the loop and fold the loop condition and the timeout check
> >> >> >> in smp_cond_load_relaxed_timeout().
> >> >> >
> >> >> > Well, it is close, but is it close enough?
> >> >>
> >> >> I guess that's the question.
> >> >>
> >> >> >> Cc: "Rafael J. Wysocki" <rafael@...nel.org>
> >> >> >> Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
> >> >> >> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> >> >> >> ---
> >> >> >>  drivers/cpuidle/poll_state.c | 29 ++++++++---------------------
> >> >> >>  1 file changed, 8 insertions(+), 21 deletions(-)
> >> >> >>
> >> >> >> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> >> >> >> index 9b6d90a72601..dc7f4b424fec 100644
> >> >> >> --- a/drivers/cpuidle/poll_state.c
> >> >> >> +++ b/drivers/cpuidle/poll_state.c
> >> >> >> @@ -8,35 +8,22 @@
> >> >> >>  #include <linux/sched/clock.h>
> >> >> >>  #include <linux/sched/idle.h>
> >> >> >>
> >> >> >> -#define POLL_IDLE_RELAX_COUNT  200
> >> >> >> -
> >> >> >>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
> >> >> >>                                struct cpuidle_driver *drv, int index)
> >> >> >>  {
> >> >> >> -       u64 time_start;
> >> >> >> -
> >> >> >> -       time_start = local_clock_noinstr();
> >> >> >> +       u64 time_end;
> >> >> >> +       u32 flags = 0;
> >> >> >>
> >> >> >>         dev->poll_time_limit = false;
> >> >> >>
> >> >> >> +       time_end = local_clock_noinstr() + cpuidle_poll_time(drv, dev);
> >> >> >
> >> >> > Is there any particular reason for doing this unconditionally?  If
> >> >> > not, then it looks like an arbitrary unrelated change to me.
> >> >>
> >> >> Agreed. Will fix.
> >> >>
> >> >> >> +
> >> >> >>         raw_local_irq_enable();
> >> >> >>         if (!current_set_polling_and_test()) {
> >> >> >> -               unsigned int loop_count = 0;
> >> >> >> -               u64 limit;
> >> >> >> -
> >> >> >> -               limit = cpuidle_poll_time(drv, dev);
> >> >> >> -
> >> >> >> -               while (!need_resched()) {
> >> >> >> -                       cpu_relax();
> >> >> >> -                       if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> >> >> >> -                               continue;
> >> >> >> -
> >> >> >> -                       loop_count = 0;
> >> >> >> -                       if (local_clock_noinstr() - time_start > limit) {
> >> >> >> -                               dev->poll_time_limit = true;
> >> >> >> -                               break;
> >> >> >> -                       }
> >> >> >> -               }
> >> >> >> +               flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
> >> >> >> +                                                     (VAL & _TIF_NEED_RESCHED),
> >> >> >> +                                                     (local_clock_noinstr() >= time_end));
> >> >> >
> >> >> > So my understanding of this is that it reduces duplication with some
> >> >> > other places doing similar things.  Fair enough.
> >> >> >
> >> >> > However, since there is "timeout" in the name, I'd expect it to take
> >> >> > the timeout as an argument.
> >> >>
> >> >> The early versions did have a timeout but that complicated the
> >> >> implementation significantly. And the current users poll_idle(),
> >> >> rqspinlock don't need a precise timeout.
> >> >>
> >> >> smp_cond_load_relaxed_timed(), smp_cond_load_relaxed_timecheck()?
> >> >>
> >> >> The problem with all suffixes I can think of is that it makes the
> >> >> interface itself nonobvious.
> >> >>
> >> >> Possibly something with the sense of bail out might work.
> >> >
> >> > It basically has two conditions, one of which is checked in every step
> >> > of the internal loop and the other one is checked every
> >> > SMP_TIMEOUT_POLL_COUNT steps of it.  That isn't particularly
> >> > straightforward IMV.
> >>
> >> Right. And that's similar to what poll_idle().
> >
> > My point is that the macro in its current form is not particularly
> > straightforward.
> >
> > The code in poll_idle() does what it needs to do.
> >
> >> > Honestly, I prefer the existing code.  It is much easier to follow and
> >> > I don't see why the new code would be better.  Sorry.
> >>
> >> I don't think there's any problem with the current code. However, I'd like
> >> to add support for poll_idle() on arm64 (and maybe other platforms) where
> >> instead of spinning in a cpu_relax() loop, you wait on a cacheline.
> >
> > Well, there is MWAIT on x86, but it is not used here.  It just takes
> > too much time to wake up from.  There are "fast" variants of that too,
> > but they have been designed with user space in mind, so somewhat
> > cumbersome for kernel use.
> >
> >> And that's what using something like smp_cond_load_relaxed_timeout()
> >> would enable.
> >>
> >> Something like the series here:
> >>   https://lore.kernel.org/lkml/87wmaljd81.fsf@oracle.com/
> >>
> >> (Sorry, should have mentioned this in the commit message.)
> >
> > I'm not sure how you can combine that with a proper timeout.
>
> Would taking the timeout as a separate argument work?
>
>   flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
>                                          (VAL & _TIF_NEED_RESCHED),
>                                          local_clock_noinstr(), time_end);
>
> Or you are thinking of something on different lines from the smp_cond_load
> kind of interface?

I would like it to be something along the lines of

arch_busy_wait_for_need_resched(time_limit);
dev->poll_time_limit = !need_resched();

and I don't care much about how exactly this is done in the arch code,
so long as it does what it says.

> > The timeout is needed because you want to break out of this when it starts
> > to take too much time, so you can go back to the idle loop and maybe
> > select a better idle state.
>
> Agreed. And that will happen with the version in the patch:
>
>      flags = smp_cond_load_relaxed_timeout(&current_thread_info()->flags,
>                                             (VAL & _TIF_NEED_RESCHED),
>                                             (local_clock_noinstr() >= time_end));
>
> Just that with waited mode on arm64 the timeout might be delayed depending
> on granularity of the event stream.

That's fine.  cpuidle_poll_time() is not exact anyway.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ