lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jul 2023 18:22:13 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: paulmck@...nel.org
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Hou Tao <houtao@...weicloud.com>, 
	Tejun Heo <tj@...nel.org>, rcu@...r.kernel.org, 
	Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Kernel Team <kernel-team@...com>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, David Vernet <void@...ifault.com>
Subject: Re: [PATCH v4 bpf-next 09/14] bpf: Allow reuse from
 waiting_for_gp_ttrace list.

On Fri, Jul 7, 2023 at 1:47 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Fri, Jul 07, 2023 at 09:11:22AM -0700, Alexei Starovoitov wrote:
> > On Thu, Jul 6, 2023 at 9:37 PM Hou Tao <houtao@...weicloud.com> wrote:
> > >
> > > Hi,
> > >
> > > On 7/7/2023 12:16 PM, Alexei Starovoitov wrote:
> > > > On Thu, Jul 6, 2023 at 8:39 PM Hou Tao <houtao@...weicloud.com> wrote:
> > > >> Hi,
> > > >>
> > > >> On 7/7/2023 10:12 AM, Alexei Starovoitov wrote:
> > > >>> On Thu, Jul 6, 2023 at 7:07 PM Hou Tao <houtao@...weicloud.com> wrote:
> > > >>>> Hi,
> > > >>>>
> > > >>>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote:
> > > >>>>
> > > SNIP
> > > >>> and it's not just waiting_for_gp_ttrace. free_by_rcu_ttrace is similar.
> > > >> I think free_by_rcu_ttrace is different, because the reuse is only
> > > >> possible after one tasks trace RCU grace period as shown below, and the
> > > >> concurrent llist_del_first() must have been completed when the head is
> > > >> reused and re-added into free_by_rcu_ttrace again.
> > > >>
> > > >> // c0->free_by_rcu_ttrace
> > > >> A -> B -> C -> nil
> > > >>
> > > >> P1:
> > > >> alloc_bulk()
> > > >>     llist_del_first(&c->free_by_rcu_ttrace)
> > > >>         entry = A
> > > >>         next = B
> > > >>
> > > >> P2:
> > > >> do_call_rcu_ttrace()
> > > >>     // c->free_by_rcu_ttrace->first = NULL
> > > >>     llist_del_all(&c->free_by_rcu_ttrace)
> > > >>         move to c->waiting_for_gp_ttrace
> > > >>
> > > >> P1:
> > > >> llist_del_first()
> > > >>     return NULL
> > > >>
> > > >> // A is only reusable after one task trace RCU grace
> > > >> // llist_del_first() must have been completed
> > > > "must have been completed" ?
> > > >
> > > > I guess you're assuming that alloc_bulk() from irq_work
> > > > is running within rcu_tasks_trace critical section,
> > > > so __free_rcu_tasks_trace() callback will execute after
> > > > irq work completed?
> > > > I don't think that's the case.
> > >
> > > Yes. The following is my original thoughts. Correct me if I was wrong:
> > >
> > > 1. llist_del_first() must be running concurrently with llist_del_all().
> > > If llist_del_first() runs after llist_del_all(), it will return NULL
> > > directly.
> > > 2. call_rcu_tasks_trace() must happen after llist_del_all(), else the
> > > elements in free_by_rcu_ttrace will not be freed back to slab.
> > > 3. call_rcu_tasks_trace() will wait for one tasks trace RCU grace period
> > > to call __free_rcu_tasks_trace()
> > > 4. llist_del_first() in running in an context with irq-disabled, so the
> > > tasks trace RCU grace period will wait for the end of llist_del_first()
> > >
> > > It seems you thought step 4) is not true, right ?
> >
> > Yes. I think so. For two reasons:
> >
> > 1.
> > I believe irq disabled region isn't considered equivalent
> > to rcu_read_lock_trace() region.
> >
> > Paul,
> > could you clarify ?
>
> You are correct, Alexei.  Unlike vanilla RCU, RCU Tasks Trace does not
> count irq-disabled regions of code as readers.
>
> But why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?
>
> For completeness, if it were not for CONFIG_TASKS_TRACE_RCU_READ_MB,
> Hou Tao would be correct from a strict current-implementation
> viewpoint.  The reason is that, given the current implementation in
> CONFIG_TASKS_TRACE_RCU_READ_MB=n kernels, a task must either block or
> take an IPI in order for the grace-period machinery to realize that this
> task is done with all prior readers.
>
> However, we need to account for the possibility of IPI-free
> implementations, for example, if the real-time guys decide to start
> making heavy use of BPF sleepable programs.  They would then insist on
> getting rid of those IPIs for CONFIG_PREEMPT_RT=y kernels.  At which
> point, irq-disabled regions of code will absolutely not act as
> RCU tasks trace readers.
>
> Again, why not just put an rcu_read_lock_trace() and a matching
> rcu_read_unlock_trace() within that irq-disabled region of code?

If I remember correctly, the general guidance is to always put an
explicit marker if it is in an RCU-reader, instead of relying on
implementation details. So the suggestion to put the marker instead of
relying on IRQ disabling does align with that.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ