netdev - Re: [PATCH v3 bpf-next 2/8] bpf: Add map side support for bpf timers.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210629014944.ctdhv7jjooxpv636@ast-mbp.dhcp.thefacebook.com>
Date:   Mon, 28 Jun 2021 18:49:45 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Yonghong Song <yhs@...com>
Cc:     davem@...emloft.net, daniel@...earbox.net, andrii@...nel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v3 bpf-next 2/8] bpf: Add map side support for bpf timers.

On Fri, Jun 25, 2021 at 12:46:03PM -0700, Yonghong Song wrote:
> 
> 
> On 6/23/21 7:25 PM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@...nel.org>
> > 
> > Restrict bpf timers to array, hash (both preallocated and kmalloced), and
> > lru map types. The per-cpu maps with timers don't make sense, since 'struct
> > bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that
> > the number of timers depends on number of possible cpus and timers would not be
> > accessible from all cpus. lpm map support can be added in the future.
> > The timers in inner maps are supported.
> > 
> > The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free
> > bpf_timer in a given map element.
> > 
> > Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate
> > that map element indeed contains 'struct bpf_timer'.
> > 
> > Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when
> > map element data is reused in preallocated htab and lru maps.
> > 
> > Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single
> > map element. There could be one of each, but not more than one. Due to 'one
> > bpf_timer in one element' restriction do not support timers in global data,
> > since global data is a map of single element, but from bpf program side it's
> > seen as many global variables and restriction of single global timer would be
> > odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with
> > timers, since user space could have corrupted mmap element and crashed the
> > kernel. The maps with timers cannot be readonly. Due to these restrictions
> > search for bpf_timer in datasec BTF in case it was placed in the global data to
> > report clear error.
> > 
> > The previous patch allowed 'struct bpf_timer' as a first field in a map
> > element only. Relax this restriction.
> > 
> > Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free
> > the timer when lru map deletes an element as a part of it eviction algorithm.
> > 
> > Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store.
> > The timer operation are done through helpers only.
> > This is similar to 'struct bpf_spin_lock'.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> 
> I didn't find major issues. Only one minor comment below. I do a race
> during map_update where updated map elements will have timer removed
> but at the same time the timer might still be used after update. But
> irq spinlock should handle this properly.

Right. It's safe from kernel pov. But non-atomic from bpf prog pov.
If one prog doing map_update_elem() it replaced the other fields
except timer, but the other fields are also not atomic.
So not a new concern for bpf authors.

> Acked-by: Yonghong Song <yhs@...com>
> 
> > ---
> >   include/linux/bpf.h        | 44 ++++++++++++-----
> >   include/linux/btf.h        |  1 +
> >   kernel/bpf/arraymap.c      | 22 +++++++++
> >   kernel/bpf/btf.c           | 77 ++++++++++++++++++++++++------
> >   kernel/bpf/hashtab.c       | 96 +++++++++++++++++++++++++++++++++-----
> >   kernel/bpf/local_storage.c |  4 +-
> >   kernel/bpf/map_in_map.c    |  1 +
> >   kernel/bpf/syscall.c       | 21 +++++++--
> >   kernel/bpf/verifier.c      | 30 ++++++++++--
> >   9 files changed, 251 insertions(+), 45 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 72da9d4d070c..90e6c6d1404c 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -198,24 +198,46 @@ static inline bool map_value_has_spin_lock(const struct bpf_map *map)
> >   	return map->spin_lock_off >= 0;
> >   }
> > -static inline void check_and_init_map_lock(struct bpf_map *map, void *dst)
> > +static inline bool map_value_has_timer(const struct bpf_map *map)
> >   {
> > -	if (likely(!map_value_has_spin_lock(map)))
> > -		return;
> > -	*(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> > -		(struct bpf_spin_lock){};
> > +	return map->timer_off >= 0;
> >   }
> > -/* copy everything but bpf_spin_lock */
> > +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > +{
> > +	if (unlikely(map_value_has_spin_lock(map)))
> > +		*(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
> > +			(struct bpf_spin_lock){};
> > +	if (unlikely(map_value_has_timer(map)))
> > +		*(struct bpf_timer *)(dst + map->timer_off) =
> > +			(struct bpf_timer){};
> > +}
> > +
> [...]
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 6f6681b07364..e85a5839ffe8 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -228,6 +228,28 @@ static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i)
> >   	return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size);
> >   }
> > +static void htab_free_prealloced_timers(struct bpf_htab *htab)
> > +{
> > +	u32 num_entries = htab->map.max_entries;
> > +	int i;
> > +
> > +	if (likely(!map_value_has_timer(&htab->map)))
> > +		return;
> > +	if (!htab_is_percpu(htab) && !htab_is_lru(htab))
> 
> Is !htab_is_percpu(htab) needed? I think we already checked
> if map_value has timer it can only be hash/lruhash/array?

Technically not, but it's good to keep this part the same as
in prealloc_init(). I'll refactor these two checks into a small
helper function and will use in both places.