linux-kernel - Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YT-vJmXgWPQ_1J34iTb+ZhrAgN7c-HPz7kW17HmvKzJ3A@mail.gmail.com>
Date:   Fri, 8 Apr 2022 13:14:35 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Kalesh Singh <kaleshsingh@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        kernel-team <kernel-team@...roid.com>, Tejun Heo <tj@...nel.org>,
        Tim Murray <timmurray@...gle.com>, Wei Wang <wvw@...gle.com>,
        Kyle Lin <kylelin@...gle.com>,
        Chunwei Lu <chunweilu@...gle.com>,
        Lulu Wang <luluw@...gle.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        rcu <rcu@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread_worker

On Fri, Apr 8, 2022 at 11:34 AM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Fri, Apr 08, 2022 at 10:41:26AM -0400, Joel Fernandes wrote:
> > On Fri, Apr 8, 2022 at 10:34 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > >
> > > On Fri, Apr 08, 2022 at 06:42:42AM -0400, Joel Fernandes wrote:
> > > > On Fri, Apr 8, 2022 at 12:57 AM Kalesh Singh <kaleshsingh@...gle.com> wrote:
> > > > >
> > > > [...]
> > > > > @@ -334,15 +334,13 @@ static bool exp_funnel_lock(unsigned long s)
> > > > >   * Select the CPUs within the specified rcu_node that the upcoming
> > > > >   * expedited grace period needs to wait for.
> > > > >   */
> > > > > -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > > > +static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
> > > > >  {
> > > > >         int cpu;
> > > > >         unsigned long flags;
> > > > >         unsigned long mask_ofl_test;
> > > > >         unsigned long mask_ofl_ipi;
> > > > >         int ret;
> > > > > -       struct rcu_exp_work *rewp =
> > > > > -               container_of(wp, struct rcu_exp_work, rew_work);
> > > > >         struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew);
> > > > >
> > > > >         raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > @@ -417,13 +415,119 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > > >                 rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
> > > > >  }
> > > > >
> > > > > +static void rcu_exp_sel_wait_wake(unsigned long s);
> > > > > +
> > > > > +#ifdef CONFIG_RCU_EXP_KTHREAD
> > > >
> > > > Just my 2c:
> > > >
> > > > Honestly, I am not sure if the benefits of duplicating the code to use
> > > > normal workqueues outweighs the drawbacks (namely code complexity,
> > > > code duplication - which can in turn cause more bugs and maintenance
> > > > headaches down the line). The code is harder to read and adding more
> > > > 30 character function names does not help.
> > > >
> > > > For something as important as expedited GPs, I can't imagine a
> > > > scenario where an RT kthread worker would cause "issues". If it does
> > > > cause issues, that's what the -rc cycles and the stable releases are
> > > > for. I prefer to trust the process than take a one-foot-in-the-door
> > > > approach.
> > > >
> > > > So please, can we just keep it simple?
> > >
> > > Yes and no.
> > >
> > > This is a bug fix, but only for those systems that are expecting real-time
> > > response from synchronize_rcu_expedited().  As far as I know, this is only
> > > Android.  The rest of the systems are just fine with the current behavior.
> >
> > As far as you know, but are you sure?
>
> None of us are sure.  We are balancing risks and potential benefits.

Right.

> > > In addition, this bug fix introduces significant risks, especially in
> > > terms of performance for throughput-oriented workloads.
> >
> > Could you explain what the risk is? That's the part I did not follow.
> > How can making synchronize_rcu_expedited() work getting priority
> > introduce throughput issues?
>
> Status quo has synchronize_rcu_expedited() workqueues running as
> SCHED_OTHER.

Yeah, so if we go by this, you are saying RCU_BOOST likely does not
work correctly on status quo right? I do not see what in the commit
message indicates that this is an Android-only issue, let me know what
I am missing.

The users affected by this will instead have these running
> as SCHED_FIFO.  That changes scheduling.  For users not explicitly
> needing low-latency synchronize_rcu_expedited(), this change is very
> unlikely to be for the better.  And historically, unnecessarily running
> portions of RCU at real-time priorities has been a change for the worse.
> As in greatly increased context-switch rates and consequently degraded
> performance.  Please note that this is not a theoretical statement:  Real
> users have really been burned by too much SCHED_FIFO in RCU kthreads in
> the past.

Android also has suffered from too much SCHED_FIFO in the past. I
remember the display thread called 'surfaceflinger' had to be dropped
from RT privilege at one point.

> > > So yes, let's do this bug fix (with appropriate adjustment), but let's
> > > also avoid exposing the non-Android workloads to risks from the inevitable
> > > unintended consequences.  ;-)
> >
> > I would argue the risk is also adding code complexity and more bugs
> > without clear rationale for why it is being done. There's always risk
> > with any change, but that's the -rc cycles and stable kernels help
> > catch those. I think we should not add more code complexity if it is a
> > theoretical concern.
> >
> > There's also another possible risk - there is a possible hidden
> > problem here that probably the non-Android folks haven't noticed or
> > been able to debug. I would rather just do the right thing.
> >
> > Just my 2c,
>
> Sorry, but my answer is still "no".
>
> Your suggested change risks visiting unacceptable performance
> degradation on a very large number of innocent users for whom current
> synchronize_rcu_expedited() latency is plenty good enough.

I believe the process will catch any such risk, but it is your call! ;-)

Thanks,

- Joel