linux-kernel - Re: [PATCH bpf-next v7 0/3] Support storing struct task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221118183113.ftsafchmurs7copl@MacBook-Pro-5.local>
Date:   Fri, 18 Nov 2022 10:31:13 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     David Vernet <void@...ifault.com>
Cc:     John Fastabend <john.fastabend@...il.com>, bpf@...r.kernel.org,
        ast@...nel.org, andrii@...nel.org, daniel@...earbox.net,
        martin.lau@...ux.dev, memxor@...il.com, yhs@...com,
        song@...nel.org, sdf@...gle.com, kpsingh@...nel.org,
        jolsa@...nel.org, haoluo@...gle.com, tj@...nel.org,
        kernel-team@...com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct
 objects as kptrs

On Fri, Nov 18, 2022 at 09:08:12AM -0600, David Vernet wrote:
> On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote:
> 
> [...]
> 
> > > > And last thing I was checking is because KF_SLEEPABLE is not set
> > > > this should be blocked from running on sleepable progs which would
> > > > break the call_rcu in the destructor. Maybe small nit, not sure
> > > > its worth it but might be nice to annotate the helper description
> > > > with a note, "will not work on sleepable progs" or something to
> > > > that effect.
> > > 
> > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> > > not whether the calling program can be sleepable. call_rcu() doesn't
> > > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> > > a kfunc is sleepable, non-sleepable programs are not able to call it
> > > (and this is enforced in the verifier).
> > 
> > OK but should these helpers be allowed in sleepable progs? I think
> > not. What stops this, (using your helpers):
> > 
> >   cpu0                                       cpu1
> >   ----
> >   v = insert_lookup_task(task)
> >   kptr = bpf_kptr_xchg(&v->task, NULL);
> >   if (!kptr)
> >     return 0;
> >                                             map_delete_elem()
> >                                                put_task()
> >                                                  rcu_call
> >   do_something_might_sleep()
> >                                                     put_task_struct
> >                                                       ... free  

the free won't happen here, because the kptr on cpu0 holds the refcnt.
bpf side never does direct free of kptr. It only inc/dec refcnt via kfuncs.

> >   kptr->[free'd memory]
> >  
> > the insert_lookup_task will bump the refcnt on the acquire on map
> > insert. But the lookup doesn't do anything to the refcnt and the

lookup from map doesn't touch kptrs in the value.
just reading v->kptr becomes PTR_UNTRUSTED with probe_mem protection.

> > map_delete_elem will delete it. We have a check for spin_lock
> > types to stop them from being in sleepable progs. Did I miss a
> > similar check for these?
> 
> So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically
> xchg the kptr from the map, and so the map_delete_elem() call would fail
> with (something like) -ENOENT. In general, the semantics are similar to
> std::unique_ptr::swap() in C++.
> 
> FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to
> reason about. The reason is that we're passing a pointer to the map
> value containing a kptr directly to the kfunc (with the attempt of
> acquiring an additional reference if a kptr was already present in the
> map) rather than doing an xchg which atomically gets us the unique
> pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone
> else could come along and delete the kptr from the map while the kfunc
> is trying to acquire that additional reference. The race looks something
> like this:
> 
>    cpu0                                       cpu1
>    ----
>    v = insert_lookup_task(task)
>    kptr = bpf_task_kptr_get(&v->task);
>                                              map_delete_elem()
>                                                 put_task()
>                                                   rcu_call
>                                                      put_task_struct
>                                                        ... free  
>    if (!kptr)
>      /* In this race example, this path will be taken. */
>      return 0;
> 
> The difference is that here, we're not doing an atomic xchg of the kptr
> out of the map. Instead, we're passing a pointer to the map value
> containing the kptr directly to bpf_task_kptr_get(), which itself tries
> to acquire an additional reference on the task to return to the program
> as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU
> and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure
> that it can't hit a UAF, and that it won't return a dying task to the
> caller:
> 
> /**
>  * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
>  * kptr acquired by this kfunc which is not subsequently stored in a map, must
>  * be released by calling bpf_task_release().
>  * @pp: A pointer to a task kptr on which a reference is being acquired.
>  */
> __used noinline
> struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
> {
>         struct task_struct *p;
> 
>         rcu_read_lock();
>         p = READ_ONCE(*pp);
> 
> 	/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 	 * cpu1 could remove the element from the map here, and invoke
> 	 * put_task_struct_rcu_user(). We're in an RCU read region
> 	 * though, so the task won't be freed until at the very
> 	 * earliest, the rcu_read_unlock() below.
> 	 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 	 */
> 
>         if (p && !refcount_inc_not_zero(&p->rcu_users))
> 		/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		 * refcount_inc_not_zero() will return false, as cpu1
> 		 * deleted the element from the map and dropped its last
> 		 * refcount. So we just return NULL as the task will be
> 		 * deleted once an RCU gp has elapsed.
> 		 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 		 */
>                 p = NULL;
>         rcu_read_unlock();
> 
>         return p;
> }
> 
> Let me know if that makes sense. This stuff is tricky, and I plan to
> clearly / thoroughly add it to that kptr docs page once this patch set
> lands.

All correct. Probably worth adding this comment directly in bpf_task_kptr_get.