[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c523078-cb24-e1ca-2439-27be37cfe90b@fb.com>
Date: Fri, 25 Jun 2021 12:46:03 -0700
From: Yonghong Song <yhs@...com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
<davem@...emloft.net>
CC: <daniel@...earbox.net>, <andrii@...nel.org>,
<netdev@...r.kernel.org>, <bpf@...r.kernel.org>,
<kernel-team@...com>
Subject: Re: [PATCH v3 bpf-next 2/8] bpf: Add map side support for bpf timers.
On 6/23/21 7:25 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@...nel.org>
>
> Restrict bpf timers to array, hash (both preallocated and kmalloced), and
> lru map types. The per-cpu maps with timers don't make sense, since 'struct
> bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that
> the number of timers depends on number of possible cpus and timers would not be
> accessible from all cpus. lpm map support can be added in the future.
> The timers in inner maps are supported.
>
> The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free
> bpf_timer in a given map element.
>
> Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate
> that map element indeed contains 'struct bpf_timer'.
>
> Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when
> map element data is reused in preallocated htab and lru maps.
>
> Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single
> map element. There could be one of each, but not more than one. Due to 'one
> bpf_timer in one element' restriction do not support timers in global data,
> since global data is a map of single element, but from bpf program side it's
> seen as many global variables and restriction of single global timer would be
> odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with
> timers, since user space could have corrupted mmap element and crashed the
> kernel. The maps with timers cannot be readonly. Due to these restrictions
> search for bpf_timer in datasec BTF in case it was placed in the global data to
> report clear error.
>
> The previous patch allowed 'struct bpf_timer' as a first field in a map
> element only. Relax this restriction.
>
> Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free
> the timer when lru map deletes an element as a part of it eviction algorithm.
>
> Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store.
> The timer operation are done through helpers only.
> This is similar to 'struct bpf_spin_lock'.
>
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
I didn't find major issues. Only one minor comment below. I do a race
during map_update where updated map elements will have timer removed
but at the same time the timer might still be used after update. But
irq spinlock should handle this properly.
Acked-by: Yonghong Song <yhs@...com>
> ---
> include/linux/bpf.h | 44 ++++++++++++-----
> include/linux/btf.h | 1 +
> kernel/bpf/arraymap.c | 22 +++++++++
> kernel/bpf/btf.c | 77 ++++++++++++++++++++++++------
> kernel/bpf/hashtab.c | 96 +++++++++++++++++++++++++++++++++-----
> kernel/bpf/local_storage.c | 4 +-
> kernel/bpf/map_in_map.c | 1 +
> kernel/bpf/syscall.c | 21 +++++++--
> kernel/bpf/verifier.c | 30 ++++++++++--
> 9 files changed, 251 insertions(+), 45 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 72da9d4d070c..90e6c6d1404c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -198,24 +198,46 @@ static inline bool map_value_has_spin_lock(const struct bpf_map *map)
> return map->spin_lock_off >= 0;
> }
>
> -static inline void check_and_init_map_lock(struct bpf_map *map, void *dst)
> +static inline bool map_value_has_timer(const struct bpf_map *map)
> {
> - if (likely(!map_value_has_spin_lock(map)))
> - return;
> - *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> - (struct bpf_spin_lock){};
> + return map->timer_off >= 0;
> }
>
> -/* copy everything but bpf_spin_lock */
> +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> +{
> + if (unlikely(map_value_has_spin_lock(map)))
> + *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> + (struct bpf_spin_lock){};
> + if (unlikely(map_value_has_timer(map)))
> + *(struct bpf_timer *)(dst + map->timer_off) =
> + (struct bpf_timer){};
> +}
> +
[...]
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 6f6681b07364..e85a5839ffe8 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -228,6 +228,28 @@ static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i)
> return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size);
> }
>
> +static void htab_free_prealloced_timers(struct bpf_htab *htab)
> +{
> + u32 num_entries = htab->map.max_entries;
> + int i;
> +
> + if (likely(!map_value_has_timer(&htab->map)))
> + return;
> + if (!htab_is_percpu(htab) && !htab_is_lru(htab))
Is !htab_is_percpu(htab) needed? I think we already checked
if map_value has timer it can only be hash/lruhash/array?
> + /* htab has extra_elems */
> + num_entries += num_possible_cpus();
> +
> + for (i = 0; i < num_entries; i++) {
> + struct htab_elem *elem;
> +
> + elem = get_htab_elem(htab, i);
> + bpf_timer_cancel_and_free(elem->key +
> + round_up(htab->map.key_size, 8) +
> + htab->map.timer_off);
> + cond_resched();
> + }
> +}
> +
[...]
Powered by blists - more mailing lists