lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <668c82787f16_d77208e0@john.notmuch>
Date: Mon, 08 Jul 2024 17:21:12 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>, 
 martin.lau@...nel.org
Cc: bpf@...r.kernel.org, 
 netdev@...r.kernel.org, 
 Daniel Borkmann <daniel@...earbox.net>, 
 Pedro Pinto <xten@...c.io>, 
 Hyunwoo Kim <v4bel@...ori.io>, 
 Wongi Lee <qwerty@...ori.io>
Subject: RE: [PATCH bpf 1/2] bpf: Fix too early release of tcx_entry

Daniel Borkmann wrote:
> Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported
> an issue that the tcx_entry can be released too early leading to a use
> after free (UAF) when an active old-style ingress or clsact qdisc with a
> shared tc block is later replaced by another ingress or clsact instance.
> 
> Essentially, the sequence to trigger the UAF (one example) can be as follows:
> 
>   1. A network namespace is created
>   2. An ingress qdisc is created. This allocates a tcx_entry, and
>      &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the
>      same time, a tcf block with index 1 is created.
>   3. chain0 is attached to the tcf block. chain0 must be connected to
>      the block linked to the ingress qdisc to later reach the function
>      tcf_chain0_head_change_cb_del() which triggers the UAF.
>   4. Create and graft a clsact qdisc. This causes the ingress qdisc
>      created in step 1 to be removed, thus freeing the previously linked
>      tcx_entry:
> 
>      rtnetlink_rcv_msg()
>        => tc_modify_qdisc()
>          => qdisc_create()
>            => clsact_init() [a]
>          => qdisc_graft()
>            => qdisc_destroy()
>              => __qdisc_destroy()
>                => ingress_destroy() [b]
>                  => tcx_entry_free()
>                    => kfree_rcu() // tcx_entry freed
> 
>   5. Finally, the network namespace is closed. This registers the
>      cleanup_net worker, and during the process of releasing the
>      remaining clsact qdisc, it accesses the tcx_entry that was
>      already freed in step 4, causing the UAF to occur:
> 
>      cleanup_net()
>        => ops_exit_list()
>          => default_device_exit_batch()
>            => unregister_netdevice_many()
>              => unregister_netdevice_many_notify()
>                => dev_shutdown()
>                  => qdisc_put()
>                    => clsact_destroy() [c]
>                      => tcf_block_put_ext()
>                        => tcf_chain0_head_change_cb_del()
>                          => tcf_chain_head_change_item()
>                            => clsact_chain_head_change()
>                              => mini_qdisc_pair_swap() // UAF
> 
> There are also other variants, the gist is to add an ingress (or clsact)
> qdisc with a specific shared block, then to replace that qdisc, waiting
> for the tcx_entry kfree_rcu() to be executed and subsequently accessing
> the current active qdisc's miniq one way or another.
> 
> The correct fix is to turn the miniq_active boolean into a counter. What
> can be observed, at step 2 above, the counter transitions from 0->1, at
> step [a] from 1->2 (in order for the miniq object to remain active during
> the replacement), then in [b] from 2->1 and finally [c] 1->0 with the
> eventual release. The reference counter in general ranges from [0,2] and
> it does not need to be atomic since all access to the counter is protected
> by the rtnl mutex. With this in place, there is no longer a UAF happening
> and the tcx_entry is freed at the correct time.
> 
> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
> Reported-by: Pedro Pinto <xten@...c.io>
> Co-developed-by: Pedro Pinto <xten@...c.io>
> Signed-off-by: Pedro Pinto <xten@...c.io>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Cc: Hyunwoo Kim <v4bel@...ori.io>
> Cc: Wongi Lee <qwerty@...ori.io>
> Cc: Martin KaFai Lau <martin.lau@...nel.org>
> ---

Acked-by: John Fastabend <john.fastabend@...il.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ