linux-kernel - Re: [PATCH 7/7] RCU, workqueue: Implement rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Mar 2018 09:28:18 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Lai Jiangshan <jiangshanlai+lkml@...il.com>
Cc:     Tejun Heo <tj@...nel.org>, torvalds@...ux-foundation.org,
        Jann Horn <jannh@...gle.com>, bcrl@...ck.org,
        viro@...iv.linux.org.uk,
        Kent Overstreet <kent.overstreet@...il.com>,
        security@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        kernel-team@...com
Subject: Re: [PATCH 7/7] RCU, workqueue: Implement rcu_work

On Thu, Mar 08, 2018 at 08:29:53AM +0800, Lai Jiangshan wrote:
> On Wed, Mar 7, 2018 at 10:54 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Wed, Mar 07, 2018 at 10:49:49AM +0800, Lai Jiangshan wrote:
> >> On Wed, Mar 7, 2018 at 1:33 AM, Tejun Heo <tj@...nel.org> wrote:
> >>
> >> > +/**
> >> > + * queue_rcu_work_on - queue work on specific CPU after a RCU grace period
> >> > + * @cpu: CPU number to execute work on
> >> > + * @wq: workqueue to use
> >> > + * @rwork: work to queue
> >>
> >> For many people, "RCU grace period" is clear enough, but not ALL.
> >>
> >> So please make it a little more clear that it just queues work after
> >> a *Normal* RCU grace period. it supports only one RCU variant.
> >>
> >>
> >> > + *
> >> > + * Return: %false if @work was already on a queue, %true otherwise.
> >> > + */
> >>
> >> I'm afraid this will be a hard-using API.
> >>
> >> The user can't find a plan B when it returns false, especially when
> >> the user expects the work must be called at least once again
> >> after an RCU grace period.
> >>
> >> And the error-prone part of it is that, like other queue_work() functions,
> >> the return value of it is often ignored and makes the problem worse.
> >>
> >> So, since workqueue.c provides this API, it should handle this
> >> problem. For example, by calling call_rcu() again in this case, but
> >> everything will be much more complex: a synchronization is needed
> >> for "calling call_rcu() again" and allowing the work item called
> >> twice after the last queue_rcu_work() is not workqueue style.
> >
> > I confess that I had not thought of this aspect, but doesn't the
> > rcu_barrier() in v2 of this patchset guarantee that it has passed
> > the RCU portion of the overall wait time?  Given that, what I am
> > missing is now this differs from flush_work() on a normal work item.
> >
> > So could you please show me the sequence of events that leads to this
> > problem?  (Again, against v2 of this patch, which replaces the
> > synchronize_rcu() with rcu_barrier().)
> 
> I mentioned a subtle use case that user would think it is supported
> since the comment doesn't disallow it.
> 
> It is clear that the user expects
>    the work must be called at least once after the API returns

I would have said "before the API returns" rather than "after the
API returns".  What am I missing here?

>    the work must be called after an RCU grace period
> 
> But in the case when the user expects the work must be called
> at least once again after "queue_rcu_work() + an RCU grace period",
> the API is not competent to it if the work is queued.
> Although the user can detect it by the return value of
> queue_rcu_work(), the user hardly makes his expectation
> happen by adding appropriate code.

Beyond a certain point, I need to defer to Tejun on this, but I thought
that a false return meant that the work executed before the flush call.
Here I am assuming that the caller knows that the queue_work_rcu()
already happened and that another queue_work_rcu() won't be invoked on
that same work item until the flush completes.  Without this sort of
assumption, I agree that there could be problems, but without these
assumptions it seems to me that you would have exactly the same problems
with the non-RCU APIs.

What am I missing?

						Thanx, Paul

> >> Some would argue that the delayed_work has the same problem
> >> when the user expects the work must be called at least once again
> >> after a period of time. But time interval is easy to detect, the user
> >> can check the time and call the queue_delayed_work() again
> >> when needed which is also a frequent design pattern. And
> >> for rcu, it is hard to use this design pattern since it is hard
> >> to detect (new) rcu grace period without using call_rcu().
> >>
> >> I would not provide this API. it is not a NACK. I'm just trying
> >> expressing my thinking about the API. I'd rather RCU be changed
> >> and RCU callbacks are changed to be sleepable. But a complete
> >> overhaul cleanup on the whole source tree for compatibility
> >> is needed at first, an even more complex job.
> >
> > One downside of allowing RCU callback functions to sleep is that
> > one poorly written callback can block a bunch of other ones.
> > One advantage of Tejun's approach is that such delays only affect
> > the workqueues, which are already designed to handle such delays.
> >
> >                                                 Thanx, Paul
> >
> >> > +bool queue_rcu_work_on(int cpu, struct workqueue_struct *wq,
> >> > +                      struct rcu_work *rwork)
> >> > +{
> >> > +       struct work_struct *work = &rwork->work;
> >> > +
> >> > +       if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
> >> > +               rwork->wq = wq;
> >> > +               rwork->cpu = cpu;
> >> > +               call_rcu(&rwork->rcu, rcu_work_rcufn);
> >> > +               return true;
> >> > +       }
> >> > +
> >> > +       return false;
> >> > +}
> >> > +EXPORT_SYMBOL(queue_rcu_work_on);
> >> > +
> >>
> >
>