linux-kernel - Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220413180709.GN4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 13 Apr 2022 11:07:09 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Hillf Danton <hdanton@...a.com>,
        Kalesh Singh <kaleshsingh@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT
 kthread_worker

On Wed, Apr 13, 2022 at 01:21:20PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> 
> On Wed, Apr 13, 2022 at 8:07 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Wed, Apr 13, 2022 at 07:37:11PM +0800, Hillf Danton wrote:
> > > On Sat, 9 Apr 2022 08:56:12 -0700 Paul E. McKenney wrote:
> > > > On Sat, Apr 09, 2022 at 03:17:40PM +0800, Hillf Danton wrote:
> > > > > On Fri, 8 Apr 2022 10:53:53 -0700 Kalesh Singh wrote
> > > > > > Thanks for the discussion everyone.
> > > > > >
> > > > > > We didn't fully switch to kthread workers to avoid changing the
> > > > > > behavior for users that dont need this low latency exp GPs. Another
> > > > > > (and perhaps more important) reason is because kthread_worker offers
> > > > > > reduced concurrency than workqueues which Pual reported can pose
> > > > > > issues on systems with a large number of CPUs.
> > > > >
> > > > > A second ... what issues were reported wrt concurrency, given the output
> > > > > of grep -nr workqueue block mm drivers.
> > > > >
> > > > > Feel free to post a URL link to the issues.
> > > >
> > > > The issues can be easily seen by inspecting kthread_queue_work() and
> > > > the functions that it invokes.  In contrast, normal workqueues uses
> > > > per-CPU mechanisms to avoid contention, as can equally easily be seen
> > > > by inspecting queue_work_on() and the functions that it invokes.
> > >
> > > The worker from kthread_create_worker() roughly matches unbound workqueue
> > > that can get every CPU overloaded, thus the difference in implementation
> > > details between kthread worker and WQ worker (either bound or unbound) can
> > > be safely ignored if the kthread method works, given that prioirty is barely
> > > a cure to concurrency issues.
> >
> > Please look again, this time taking lock contention in to account,
> > keeping in mind that systems with several hundred CPUs are reasonably
> > common and that systems with more than a thousand CPUs are not unheard of.
> 
> You are talking about lock contention in the kthread_worker infra
> which unbound WQ does not suffer from, right? I don't think the worker
> lock contention will be an issue unless several
> synchronize_rcu_expedited() calls are trying to queue work at the same
> time. Did I miss something? Considering synchronize_rcu_expedited()
> can block in the normal case (blocking is a pretty heavy operation
> involving the scheduler and load balancers), I don't see how
> contending on the worker infra locks can be an issue. If it was
> call_rcu() , then I can relate to any contention since that executes
> much more often.

Think in terms of a system with 1536 CPUs (which IBM would be extremely
happy to sell you, last I checked).  This has 96 leaf rcu_node structures.
Keeping that in mind, take another look at that code.

And in the past there have been real systems with 256 leaf rcu_node
structures.

> I think the argument about too many things being RT is stronger though.

Fair enough.  Except that this could be dealt with by conditionally
setting SCHED_FIFO.  But the lock contention would remain.

							Thanx, Paul

> Thanks,
> 
> Joel
> 
> 
> >
> >
> >                                                         Thanx, Paul
> >
> > > Hillf
> > > >
> > > > Please do feel free to take a look.
> > > >
> > > > If taking a look does not convince you, please construct some in-kernel
> > > > benchmarks to test the scalability of these two mechanisms.  Please note
> > > > that some care will be required to make sure that you are doing a valid
> > > > apples-to-apples comparison.
> > > >
> > > >                                                     Thanx, Paul
> > > >