lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRm6VuZjG4u2tkAyihCT4Jd194VP3ZMK=foErtVw_cLcg@mail.gmail.com>
Date:   Mon, 26 Aug 2019 21:40:42 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        Jonathan Corbet <corbet@....net>,
        Josh Triplett <josh@...htriplett.org>,
        kernel-team <kernel-team@...roid.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu <rcu@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC v1 2/2] rcu/tree: Remove dynticks_nmi_nesting counter

On Mon, Aug 26, 2019 at 9:34 PM Joel Fernandes (Google)
<joel@...lfernandes.org> wrote:
>
> The dynticks_nmi_nesting counter serves 4 purposes:
>

And actually, I meant 3 purposes ;-) :-P

thanks,

 - Joel


>       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
>           interrupt nesting level.
>
>       (b) We need to detect half-interrupts till we are sure they're not an
>           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
>
>       (c) When a quiescent state report is needed from a nohz_full CPU.
>           The nesting counter detects we are a first level interrupt.
>
> For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> outermost interrupt that interrupted an RCU-idle state can set it to 1.
>
> For (b), this warning condition has not occurred for several kernel
> releases.  But we still keep the warning but change it to use
> in_interrupt() instead of the nesting counter. In a later year, we can
> remove the warning.
>
> For (c), the nest check is not really necessary since forced_tick would
> have been set to true in the outermost interrupt, so the nested/NMI
> interrupts will check forced_tick anyway, and bail.
>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
>  .../Data-Structures/Data-Structures.rst       | 31 +++------
>  Documentation/RCU/stallwarn.txt               |  6 +-
>  kernel/rcu/tree.c                             | 64 +++++++------------
>  kernel/rcu/tree.h                             |  4 +-
>  kernel/rcu/tree_stall.h                       |  4 +-
>  5 files changed, 41 insertions(+), 68 deletions(-)
>
> diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> index 4a48e20a46f2..a5a907f434a1 100644
> --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> @@ -936,10 +936,9 @@ This portion of the rcu_data structure is declared as follows:
>  ::
>
>       1   long dynticks_nesting;
> -     2   long dynticks_nmi_nesting;
> -     3   atomic_t dynticks;
> -     4   bool rcu_need_heavy_qs;
> -     5   bool rcu_urgent_qs;
> +     2   atomic_t dynticks;
> +     3   bool rcu_need_heavy_qs;
> +     4   bool rcu_urgent_qs;
>
>  These fields in the rcu_data structure maintain the per-CPU dyntick-idle
>  state for the corresponding CPU. The fields may be accessed only from
> @@ -948,26 +947,14 @@ the corresponding CPU (and from tracing) unless otherwise stated.
>  The ``->dynticks_nesting`` field counts the nesting depth of process
>  execution, so that in normal circumstances this counter has value zero
>  or one. NMIs, irqs, and tracers are counted by the
> -``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
> +``->dynticks_nesting`` field as well. Because NMIs cannot be masked, changes
>  to this variable have to be undertaken carefully using an algorithm
>  provided by Andy Lutomirski. The initial transition from idle adds one,
>  and nested transitions add two, so that a nesting level of five is
> -represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
> +represented by a ``->dynticks_nesting`` value of nine. This counter
>  can therefore be thought of as counting the number of reasons why this
> -CPU cannot be permitted to enter dyntick-idle mode, aside from
> -process-level transitions.
> -
> -However, it turns out that when running in non-idle kernel context, the
> -Linux kernel is fully capable of entering interrupt handlers that never
> -exit and perhaps also vice versa. Therefore, whenever the
> -``->dynticks_nesting`` field is incremented up from zero, the
> -``->dynticks_nmi_nesting`` field is set to a large positive number, and
> -whenever the ``->dynticks_nesting`` field is decremented down to zero,
> -the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
> -the number of misnested interrupts is not sufficient to overflow the
> -counter, this approach corrects the ``->dynticks_nmi_nesting`` field
> -every time the corresponding CPU enters the idle loop from process
> -context.
> +CPU cannot be permitted to enter dyntick-idle mode. It counts both the
> +process-level and interrupt transitions.
>
>  The ``->dynticks`` field counts the corresponding CPU's transitions to
>  and from either dyntick-idle or user mode, so that this counter has an
> @@ -1000,7 +987,9 @@ code.
>  +-----------------------------------------------------------------------+
>  | Because this would fail in the presence of interrupts whose handlers  |
>  | never return and of handlers that manage to return from a made-up     |
> -| interrupt.                                                            |
> +| interrupt. NOTE: The counters have now been combined however          |
> +| a temporary warning has been left to make sure this condition never   |
> +| occurs.                                                               |
>  +-----------------------------------------------------------------------+
>
>  Additional fields are present for some special-purpose builds, and are
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index f48f4621ccbc..585f73009a56 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -173,8 +173,8 @@ For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
>  it will print a message similar to the following:
>
>         INFO: rcu_sched detected stalls on CPUs/tasks:
> -       2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
> -       16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
> +       2-...: (3 GPs behind) idle=06c/0 softirq=1453/1455 fqs=0
> +       16-...: (0 ticks this GP) idle=81c/0 softirq=764/764 fqs=0
>         (detected by 32, t=2603 jiffies, g=7075, q=625)
>
>  This message indicates that CPU 32 detected that CPUs 2 and 16 were both
> @@ -225,7 +225,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
>  In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
>  for each CPU:
>
> -       0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
> +       0: (64628 ticks this GP) idle=dd5/3fffffffffffffff softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
>
>  The "last_accelerate:" prints the low-order 16 bits (in hex) of the
>  jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 255cd6835526..1465a3e406f8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -81,7 +81,6 @@
>
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>         .dynticks_nesting = 1,
> -       .dynticks_nmi_nesting = 0,
>         .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  struct rcu_state rcu_state = {
> @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>         /* Check for counter underflows */
>         RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
>                          "RCU dynticks_nesting counter underflow!");
> -       RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> -                        "RCU dynticks_nmi_nesting counter underflow/zero!");
>
> -       /* Are we at first interrupt nesting level? */
> -       if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> -               return false;
> -
> -       /* Does CPU appear to be idle from an RCU standpoint? */
> -       return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> +       /* Are we the outermost interrupt that arrived when RCU was idle? */
> +       return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
>  }
>
>  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
>         struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
>         /* Entering usermode/idle from interrupt is not handled. These would
> -        * mean usermode upcalls or idle entry happened from interrupts. But,
> -        * reset the counter if we warn.
> +        * mean usermode upcalls or idle exit happened from interrupts. Remove
> +        * the warning by 2020.
>          */
> -       if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> -               WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +       WARN_ON_ONCE(in_interrupt());
>
>         WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>                      rdp->dynticks_nesting == 0);
> @@ -627,9 +619,8 @@ void rcu_user_enter(void)
>
>  /*
>   * If we are returning from the outermost NMI handler that interrupted an
> - * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
> - * to let the RCU grace-period handling know that the CPU is back to
> - * being RCU-idle.
> + * RCU-idle period, update rdp->dynticks to let the RCU grace-period handling
> + * know that the CPU is back to being RCU-idle.
>   *
>   * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
> @@ -639,16 +630,13 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
>         struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
>         /*
> -        * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
> +        * Check for ->dynticks_nesting underflow and bad ->dynticks.
>          * (We are exiting an NMI handler, so RCU better be paying attention
>          * to us!)
>          */
>         WARN_ON_ONCE(rdp->dynticks_nesting <= 0);
> -       WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
>         WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
>
> -       WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> -                  rdp->dynticks_nmi_nesting - 1);
>         /*
>          * If the nesting level is not 1, the CPU wasn't RCU-idle, so
>          * leave it in non-RCU-idle state.
> @@ -750,11 +738,10 @@ static void rcu_eqs_exit(bool user)
>         WRITE_ONCE(rdp->dynticks_nesting, 1);
>
>         /* Exiting usermode/idle from interrupt is not handled. These would
> -        * mean usermode upcalls or idle exit happened from interrupts. But,
> -        * reset the counter if we warn.
> +        * mean usermode upcalls or idle exit happened from interrupts. Remove
> +        * the warning by 2020.
>          */
> -       if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> -               WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +       WARN_ON_ONCE(in_interrupt());
>  }
>
>  /**
> @@ -795,14 +782,13 @@ void rcu_user_exit(void)
>   * rcu_nmi_enter_common - inform RCU of entry to NMI context
>   * @irq: Is this call from rcu_irq_enter?
>   *
> - * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
> - * rdp->dynticks_nmi_nesting to let the RCU grace-period handling know
> - * that the CPU is active.  This implementation permits nested NMIs, as
> - * long as the nesting level does not overflow an int.  (You will probably
> - * run out of stack space first.)
> + * If the CPU was idle from RCU's viewpoint, update rdp->dynticks to let the
> + * RCU grace-period handling know that the CPU is active.  This implementation
> + * permits nested NMIs, as long as the nesting level does not overflow a long.
> + * (You will probably run out of stack space first.)
>   *
> - * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
> - * with CONFIG_RCU_EQS_DEBUG=y.
> + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test with
> + * CONFIG_RCU_EQS_DEBUG=y.
>   */
>  static __always_inline void rcu_nmi_enter_common(bool irq)
>  {
> @@ -811,15 +797,16 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>
>         /* Complain about underflow. */
>         WARN_ON_ONCE(rdp->dynticks_nesting < 0);
> -       WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
>
>         /*
>          * If idle from RCU viewpoint, atomically increment ->dynticks
> -        * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> -        * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> -        * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> +        * to mark non-idle and increment ->dynticks_nesting by one.
> +        * Otherwise, increment ->dynticks_nesting by two.  This means
> +        * if ->dynticks_nesting is equal to one, we are guaranteed
>          * to be in the outermost NMI handler that interrupted an RCU-idle
> -        * period (observation due to Andy Lutomirski).
> +        * period (observation due to Andy Lutomirski). An exception
> +        * is if the interrupt arrived in kernel mode; in this case we would
> +        * be the outermost interrupt but still increment by 2 which is Ok.
>          */
>         if (rcu_dynticks_curr_cpu_in_eqs()) {
>
> @@ -832,8 +819,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>                         rcu_cleanup_after_idle();
>
>                 incby = 1;
> -       } else if (tick_nohz_full_cpu(rdp->cpu) &&
> -                  !rdp->dynticks_nmi_nesting && rdp->rcu_urgent_qs &&
> +       } else if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_urgent_qs &&
>                    !rdp->rcu_forced_tick) {
>                 rdp->rcu_forced_tick = true;
>                 tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> @@ -846,8 +832,6 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>         WRITE_ONCE(rdp->dynticks_nesting, /* Prevent store tearing. */
>                    rdp->dynticks_nesting + incby);
>
> -       WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
> -                  rdp->dynticks_nmi_nesting + 1);
>         barrier();
>  }
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 055c31781d3a..ad7d3e31c5cf 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -176,8 +176,8 @@ struct rcu_data {
>
>         /* 3) dynticks interface. */
>         int dynticks_snap;              /* Per-GP tracking for dynticks. */
> -       long dynticks_nesting;          /* Track process nesting level. */
> -       long dynticks_nmi_nesting;      /* Track irq/NMI nesting level. */
> +       long dynticks_nesting;          /* Track dyntick (non-IDLE) nesting
> +                                        * level for kernel entry and interrupt. */
>         atomic_t dynticks;              /* Even value for idle, else odd. */
>         bool rcu_need_heavy_qs;         /* GP old, so heavy quiescent state! */
>         bool rcu_urgent_qs;             /* GP old need light quiescent state. */
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 841ab43f3e60..0676460107d0 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -313,7 +313,7 @@ static void print_cpu_stall_info(int cpu)
>         }
>         print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
>         delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
> -       pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
> +       pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
>                cpu,
>                "O."[!!cpu_online(cpu)],
>                "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
> @@ -323,7 +323,7 @@ static void print_cpu_stall_info(int cpu)
>                                 "!."[!delta],
>                ticks_value, ticks_title,
>                rcu_dynticks_snap(rdp) & 0xfff,
> -              rdp->dynticks_nesting, rdp->dynticks_nmi_nesting,
> +              rdp->dynticks_nesting,
>                rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>                READ_ONCE(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
>                fast_no_hz);
> --
> 2.23.0.187.g17f5b7556c-goog
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ