[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05074f73-9d84-41e8-8368-51311a794636@paulmck-laptop>
Date: Sun, 9 Jul 2023 21:45:25 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Hou Tao <houtao@...weicloud.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Tejun Heo <tj@...nel.org>, rcu@...r.kernel.org,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Kernel Team <kernel-team@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
David Vernet <void@...ifault.com>
Subject: Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from
waiting_for_gp_ttrace list.
On Sat, Jul 08, 2023 at 03:03:40PM +0800, Hou Tao wrote:
> Hi,
>
> On 7/8/2023 1:47 AM, Paul E. McKenney wrote:
> > On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote:
> >> On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@...weicloud.com> wrote:
> SNIP
> >>> I guess you're assuming that alloc_bulk() from irq_work
> >>> is running within rcu_tasks_trace critical section,
> >>> so __free_rcu_tasks_trace() callback will execute after
> >>> irq work completed?
> >>> I don't think that's the case.
> >>> Yes. The following is my original thoughts. Correct me if I was wrong:
> >>>
> >>> 1. llist_del_first() must be running concurrently with llist_del_all().
> >>> If llist_del_first() runs after llist_del_all(), it will return NULL
> >>> directly.
> >>> 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the
> >>> elements in free_by_rcu_ttrace will not be freed back to slab.
> >>> 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period
> >>> to call __free_rcu_tasks_trace()
> >>> 4. llist_del_first() in running in an context with irq-disabled, so the
> >>> tasks trace RCU grace period will wait for the end of llist_del_first()
> >>>
> >>> It seems you thought step 4) is not true, right ?
> >> Yes. I think so. For two reasons:
> >>
> >> 1.
> >> I believe irq disabled region isn't considered equivalent
> >> to rcu_read_lock_trace() region.
> >>
> >> Paul,
> >> could you clarify ?
> > You are correct, Alexei. Unlike vanilla RCU, RCU Tasks Trace does not
> > count irq-disabled regions of code as readers.
>
> I see. But I still have one question: considering that in current
> implementation one Tasks Trace RCU grace period implies one vanilla RCU
> grace period (aka rcu_trace_implies_rcu_gp), so in my naive
> understanding of RCU, does that mean __free_rcu_tasks_trace() will be
> invoked after the expiration of current Task Trace RCU grace period,
> right ? And does it also mean __free_rcu_tasks_trace() will be invoked
> after the expiration of current vanilla RCU grace period, right ? If
> these two conditions above are true, does it mean
> __free_rcu_tasks_trace() will wait for the irq-disabled code reigion ?
First, good show digging into the code!
However, this is guaranteed only if rcu_trace_implies_rcu_gp(), which
right now happens to return the constant true. In other words, that is
an accident of the current implementation. To rely on this, you must
check the return value of rcu_trace_implies_rcu_gp() and then have some
alternative code that does not rely on that synchronize_rcu().
> > But why not just put an rcu_read_lock_trace() and a matching
> > rcu_read_unlock_trace() within that irq-disabled region of code?
> >
> > For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB,
> > Hou Tao would be correct from a strict current-implementation
> > viewpoint. The reason is that, given the current implementation in
> > CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or
> > take an IPI in order for the grace-period machinery to realize that this
> > task is done with all prior readers.
>
> Thanks for the detailed explanation.
>
> > However, we need to account for the possibility of IPI-free
> > implementations, for example, if the real-time guys decide to start
> > making heavy use of BPF sleepable programs. They would then insist on
> > getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels. At which
> > point, irq-disabled regions of code will absolutely not act as
> > RCU tasks trace readers.
> >
> > Again, why not just put an rcu_read_lock_trace() and a matching
> > rcu_read_unlock_trace() within that irq-disabled region of code?
> >
> > Or maybe there is a better workaround.
>
> Yes. I think we could use rcu_read_{lock,unlock}_trace to fix the ABA
> problem for free_by_rcu_ttrace.
That sounds good to me!
Thanx, Paul
> >> 2.
> >> Even if 1 is incorrect, in RT llist_del_first() from alloc_bulk()
> >> runs "in a per-CPU thread in preemptible context."
> >> See irq_work_run_list.
> > Agreed, under RT, "interrupt handlers" often run in task context.
>
> Yes, I missed that. I misread alloc_bulk(), and it seems it only does
> inc_active() for c->free_llist.
> > Thanx, Paul
>
Powered by blists - more mailing lists