[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190131104754.GA31516@hirez.programming.kicks-ass.net>
Date: Thu, 31 Jan 2019 11:47:54 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Alexei Starovoitov <ast@...nel.org>
Cc: davem@...emloft.net, daniel@...earbox.net, jannh@...gle.com,
netdev@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v6 bpf-next 1/9] bpf: introduce bpf_spin_lock
On Wed, Jan 30, 2019 at 09:05:38PM -0800, Alexei Starovoitov wrote:
> Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
> bpf program serialize access to other variables.
>
> Example:
> struct hash_elem {
> int cnt;
> struct bpf_spin_lock lock;
> };
> struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
> if (val) {
> bpf_spin_lock(&val->lock);
> val->cnt++;
> bpf_spin_unlock(&val->lock);
> }
>
> Restrictions and safety checks:
> - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
> - BTF description of the map is mandatory for safety analysis.
> - bpf program can take one bpf_spin_lock at a time, since two or more can
> cause dead locks.
> - only one 'struct bpf_spin_lock' is allowed per map element.
> It drastically simplifies implementation yet allows bpf program to use
> any number of bpf_spin_locks.
> - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
> - bpf program must bpf_spin_unlock() before return.
> - bpf program can access 'struct bpf_spin_lock' only via
> bpf_spin_lock()/bpf_spin_unlock() helpers.
> - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
> - to use bpf_spin_lock() helper the BTF description of map value must be
> a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
> Nested lock inside another struct is not allowed.
> - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
> - syscall map_update and program map_update do not update bpf_spin_lock field.
> - bpf_spin_lock cannot be on the stack or inside networking packet.
> bpf_spin_lock can only be inside HASH or ARRAY map value.
> - bpf_spin_lock is available to root only and to all program types.
> - bpf_spin_lock is not allowed in inner maps of map-in-map.
> - ld_abs is not allowed inside spin_lock-ed region.
> - tracing progs and socket filter progs cannot use bpf_spin_lock due to
> insufficient preemption checks
>
> Implementation details:
> - cgroup-bpf class of programs can nest with xdp/tc programs.
> Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
> Other solutions to avoid nested bpf_spin_lock are possible.
> Like making sure that all networking progs run with softirq disabled.
> spin_lock_irqsave is the simplest and doesn't add overhead to the
> programs that don't use it.
> - arch_spinlock_t is used when its implemented as queued_spin_lock
> - archs can force their own arch_spinlock_t
> - on architectures where queued_spin_lock is not available and
> sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
> - presence of bpf_spin_lock inside map value could have been indicated via
> extra flag during map_create, but specifying it via BTF is cleaner.
> It provides introspection for map key/value and reduces user mistakes.
>
> Next steps:
> - allow bpf_spin_lock in other map types (like cgroup local storage)
> - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
> to request kernel to grab bpf_spin_lock before rewriting the value.
> That will serialize access to map elements.
>
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Small nit below.
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a74972b07e74..29a8a4e62b8a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -221,6 +221,86 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
> .arg2_type = ARG_CONST_SIZE,
> };
>
> +#if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> + arch_spinlock_t *l = (void *)lock;
> + union {
> + __u32 val;
> + arch_spinlock_t lock;
> + } u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
> +
> + compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
> + BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
> + BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
> + arch_spin_lock(l);
> +}
> +
> +static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> +{
> + arch_spinlock_t *l = (void *)lock;
> +
> + arch_spin_unlock(l);
> +}
> +
> +#else
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> + atomic_t *l = (void *)lock;
> +
> + BUILD_BUG_ON(sizeof(*l) != sizeof(*lock));
> + do {
> + atomic_cond_read_relaxed(l, !VAL);
> + } while (atomic_xchg(l, 1));
> +}
> +
> +static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> +{
> + atomic_t *l = (void *)lock;
> +
> + atomic_set_release(l, 0);
> +}
> +
> +#endif
> +
> +static DEFINE_PER_CPU(unsigned long, irqsave_flags);
> +
> +notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + __bpf_spin_lock(lock);
> + this_cpu_write(irqsave_flags, flags);
__this_cpu_write()
> + return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_lock_proto = {
> + .func = bpf_spin_lock,
> + .gpl_only = false,
> + .ret_type = RET_VOID,
> + .arg1_type = ARG_PTR_TO_SPIN_LOCK,
> +};
> +
> +notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
> +{
> + unsigned long flags;
> +
> + flags = this_cpu_read(irqsave_flags);
__this_cpu_read()
> + __bpf_spin_unlock(lock);
> + local_irq_restore(flags);
> + return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_unlock_proto = {
> + .func = bpf_spin_unlock,
> + .gpl_only = false,
> + .ret_type = RET_VOID,
> + .arg1_type = ARG_PTR_TO_SPIN_LOCK,
> +};
> +
> #ifdef CONFIG_CGROUPS
> BPF_CALL_0(bpf_get_current_cgroup_id)
> {
Powered by blists - more mailing lists