lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 17 Jun 2021 21:55:58 -0700 From: Martin KaFai Lau <kafai@...com> To: Toke Høiland-Jørgensen <toke@...hat.com> CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>, Hangbin Liu <liuhangbin@...il.com>, Jesper Dangaard Brouer <brouer@...hat.com>, Magnus Karlsson <magnus.karlsson@...il.com>, "Paul E . McKenney" <paulmck@...nel.org>, Jakub Kicinski <kuba@...nel.org> Subject: Re: [PATCH bpf-next v3 03/16] xdp: add proper __rcu annotations to redirect map entries On Thu, Jun 17, 2021 at 11:27:35PM +0200, Toke Høiland-Jørgensen wrote: > XDP_REDIRECT works by a three-step process: the bpf_redirect() and > bpf_redirect_map() helpers will lookup the target of the redirect and store > it (along with some other metadata) in a per-CPU struct bpf_redirect_info. > Next, when the program returns the XDP_REDIRECT return code, the driver > will call xdp_do_redirect() which will use the information thus stored to > actually enqueue the frame into a bulk queue structure (that differs > slightly by map type, but shares the same principle). Finally, before > exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will > flush all the different bulk queues, thus completing the redirect. > > Pointers to the map entries will be kept around for this whole sequence of > steps, protected by RCU. However, there is no top-level rcu_read_lock() in > the core code; instead drivers add their own rcu_read_lock() around the XDP > portions of the code, but somewhat inconsistently as Martin discovered[0]. > However, things still work because everything happens inside a single NAPI > poll sequence, which means it's between a pair of calls to > local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could > document this intention by using rcu_dereference_check() with > rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and > lockdep to verify that everything is done correctly. > > This patch does just that: we add an __rcu annotation to the map entry > pointers and remove the various comments explaining the NAPI poll assurance > strewn through devmap.c in favour of a longer explanation in filter.c. The > goal is to have one coherent documentation of the entire flow, and rely on > the RCU annotations as a "standard" way of communicating the flow in the > map code (which can additionally be understood by sparse and lockdep). > > The RCU annotation replacements result in a fairly straight-forward > replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE() > becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the > proper constructs to cast the pointer back and forth between __rcu and > __kernel address space (for the benefit of sparse). The one complication is > that xskmap has a few constructions where double-pointers are passed back > and forth; these simply all gain __rcu annotations, and only the final > reference/dereference to the inner-most pointer gets changed. > > With this, everything can be run through sparse without eliciting > complaints, and lockdep can verify correctness even without the use of > rcu_read_lock() in the drivers. Subsequent patches will clean these up from > the drivers. > > [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/ > [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ > > Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com> Acked-by: Martin KaFai Lau <kafai@...com>
Powered by blists - more mailing lists