lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <583B9C8C.9040703@gmail.com>
Date:   Sun, 27 Nov 2016 18:55:08 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>, davem@...emloft.net
Cc:     xiyou.wangcong@...il.com, roid@...lanox.com, ast@...com,
        hannes@...essinduktion.org, jiri@...lanox.com,
        netdev@...r.kernel.org
Subject: Re: [PATCH net] net, sched: respect rcu grace period on cls
 destruction

On 16-11-26 04:18 PM, Daniel Borkmann wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
> 
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
> 
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
> 
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
> 
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
> 
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.

Thanks looks good to me and appreciate the detailed commit message.

Acked-by: John Fastabend <john.r.fastabend@...el.com>

> 
> Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
> Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
> Reported-by: Roi Dayan <roid@...lanox.com>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Cc: Cong Wang <xiyou.wangcong@...il.com>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: Roi Dayan <roid@...lanox.com>
> Cc: Jiri Pirko <jiri@...lanox.com>
> ---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ