lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Jun 2021 21:55:58 -0700
From:   Martin KaFai Lau <kafai@...com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
CC:     <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
        Hangbin Liu <liuhangbin@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Magnus Karlsson <magnus.karlsson@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH bpf-next v3 03/16] xdp: add proper __rcu annotations to
 redirect map entries

On Thu, Jun 17, 2021 at 11:27:35PM +0200, Toke Høiland-Jørgensen wrote:
> XDP_REDIRECT works by a three-step process: the bpf_redirect() and
> bpf_redirect_map() helpers will lookup the target of the redirect and store
> it (along with some other metadata) in a per-CPU struct bpf_redirect_info.
> Next, when the program returns the XDP_REDIRECT return code, the driver
> will call xdp_do_redirect() which will use the information thus stored to
> actually enqueue the frame into a bulk queue structure (that differs
> slightly by map type, but shares the same principle). Finally, before
> exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will
> flush all the different bulk queues, thus completing the redirect.
> 
> Pointers to the map entries will be kept around for this whole sequence of
> steps, protected by RCU. However, there is no top-level rcu_read_lock() in
> the core code; instead drivers add their own rcu_read_lock() around the XDP
> portions of the code, but somewhat inconsistently as Martin discovered[0].
> However, things still work because everything happens inside a single NAPI
> poll sequence, which means it's between a pair of calls to
> local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could
> document this intention by using rcu_dereference_check() with
> rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and
> lockdep to verify that everything is done correctly.
> 
> This patch does just that: we add an __rcu annotation to the map entry
> pointers and remove the various comments explaining the NAPI poll assurance
> strewn through devmap.c in favour of a longer explanation in filter.c. The
> goal is to have one coherent documentation of the entire flow, and rely on
> the RCU annotations as a "standard" way of communicating the flow in the
> map code (which can additionally be understood by sparse and lockdep).
> 
> The RCU annotation replacements result in a fairly straight-forward
> replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE()
> becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the
> proper constructs to cast the pointer back and forth between __rcu and
> __kernel address space (for the benefit of sparse). The one complication is
> that xskmap has a few constructions where double-pointers are passed back
> and forth; these simply all gain __rcu annotations, and only the final
> reference/dereference to the inner-most pointer gets changed.
> 
> With this, everything can be run through sparse without eliciting
> complaints, and lockdep can verify correctness even without the use of
> rcu_read_lock() in the drivers. Subsequent patches will clean these up from
> the drivers.
> 
> [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/
> [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
Acked-by: Martin KaFai Lau <kafai@...com>

Powered by blists - more mailing lists