linux-kernel - Re: [PATCH v2 2/2] iopoll: Do not use timekeeping in read_poll_timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFoTJFoDtSdPcXXQN_zi+TCQwr3UjLYu5jMCq_1sCnnG3Q@mail.gmail.com>
Date:   Thu, 11 May 2023 12:26:44 +0200
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     Geert Uytterhoeven <geert+renesas@...der.be>
Cc:     Stephen Boyd <sboyd@...nel.org>,
        Tomasz Figa <tomasz.figa@...il.com>,
        Sylwester Nawrocki <s.nawrocki@...sung.com>,
        Will Deacon <will@...nel.org>, Arnd Bergmann <arnd@...db.de>,
        Wolfram Sang <wsa+renesas@...g-engineering.com>,
        Dejin Zheng <zhengdejin5@...il.com>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Russell King <linux@...linux.org.uk>,
        John Stultz <jstultz@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tony Lindgren <tony@...mide.com>,
        Krzysztof Kozlowski <krzk@...nel.org>,
        Tero Kristo <tero.kristo@...ux.intel.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-arm-kernel@...ts.infradead.org,
        linux-renesas-soc@...r.kernel.org, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] iopoll: Do not use timekeeping in read_poll_timeout_atomic()

On Wed, 10 May 2023 at 15:23, Geert Uytterhoeven
<geert+renesas@...der.be> wrote:
>
> read_poll_timeout_atomic() uses ktime_get() to implement the timeout
> feature, just like its non-atomic counterpart.  However, there are
> several issues with this, due to its use in atomic contexts:
>
>   1. When called in the s2ram path (as typically done by clock or PM
>      domain drivers), timekeeping may be suspended, triggering the
>      WARN_ON(timekeeping_suspended) in ktime_get():
>
>         WARNING: CPU: 0 PID: 654 at kernel/time/timekeeping.c:843 ktime_get+0x28/0x78
>
>      Calling ktime_get_mono_fast_ns() instead of ktime_get() would get
>      rid of that warning.  However, that would break timeout handling,
>      as (at least on systems with an ARM architectured timer), the time
>      returned by ktime_get_mono_fast_ns() does not advance while
>      timekeeping is suspended.
>      Interestingly, (on the same ARM systems) the time returned by
>      ktime_get() does advance while timekeeping is suspended, despite
>      the warning.

Interesting, looks like we should spend some time to further
investigate this behaviour.

>
>   2. Depending on the actual clock source, and especially before a
>      high-resolution clocksource (e.g. the ARM architectured timer)
>      becomes available, time may not advance in atomic contexts, thus
>      breaking timeout handling.
>
> Fix this by abandoning the idea that one can rely on timekeeping to
> implement timeout handling in all atomic contexts, and switch from a
> global time-based to a locally-estimated timeout handling.  In most
> (all?) cases the timeout condition is exceptional and an error
> condition, hence any additional delays due to underestimating wall clock
> time are irrelevant.

I wonder if this isn't an oversimplification of the situation. Don't
we have timeout-error-conditions that we expected to happen quite
frequently?

If so, in these cases, we really don't want to continue looping longer
than actually needed, as then we will remain in the atomic context
longer than necessary.

I guess some information about how big these additional delays could
be, would help to understand better. Of course, it's not entirely easy
to get that data, but did you run some tests to see how this changes?

>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@...der.be>
> ---
> Alternatively, one could use a mixed approach (use both
> ktime_get_mono_fast_ns() and a local (under)estimate, and timeout on the
> earliest occasion), but I think that would complicate things without
> much gain.

Another option could be to provide two different polling APIs for the
atomic use-case.

One that keeps using ktime, which is more accurate and generally
favourable - and another, along the lines of what you propose, that
should be used by those that can't rely on timekeeping.

>
> v2:
>   - New.
> ---
>  include/linux/iopoll.h | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
> index 0417360a6db9b0d6..bb2e1d9117e96679 100644
> --- a/include/linux/iopoll.h
> +++ b/include/linux/iopoll.h
> @@ -81,22 +81,30 @@
>                                         delay_before_read, args...) \
>  ({ \
>         u64 __timeout_us = (timeout_us); \
> +       s64 __left_ns = __timeout_us * NSEC_PER_USEC; \
>         unsigned long __delay_us = (delay_us); \
> -       ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> -       if (delay_before_read && __delay_us) \
> +       u64 __delay_ns = __delay_us * NSEC_PER_USEC; \
> +       if (delay_before_read && __delay_us) { \
>                 udelay(__delay_us); \
> +               if (__timeout_us) \
> +                       __left_ns -= __delay_ns; \
> +       } \
>         for (;;) { \
>                 (val) = op(args); \
>                 if (cond) \
>                         break; \
> -               if (__timeout_us && \
> -                   ktime_compare(ktime_get(), __timeout) > 0) { \
> +               if (__timeout_us && __left_ns < 0) { \
>                         (val) = op(args); \
>                         break; \
>                 } \
> -               if (__delay_us) \
> +               if (__delay_us) { \
>                         udelay(__delay_us); \
> +                       if (__timeout_us) \
> +                               __left_ns -= __delay_ns; \
> +               } \
>                 cpu_relax(); \
> +               if (__timeout_us) \
> +                       __left_ns--; \
>         } \
>         (cond) ? 0 : -ETIMEDOUT; \
>  })
> --
> 2.34.1
>

Kind regards
Uffe