netdev - Re: [PATCH bpf-next v6 06/34] bpf: prepare for memcg-based memory accounting for bpf maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <41eb5e5b-e651-4cb3-a6ea-9ff6b8aa41fb@iogearbox.net>
Date:   Wed, 18 Nov 2020 01:06:17 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Roman Gushchin <guro@...com>, bpf@...r.kernel.org
Cc:     ast@...nel.org, netdev@...r.kernel.org, andrii@...nel.org,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH bpf-next v6 06/34] bpf: prepare for memcg-based memory
 accounting for bpf maps

On 11/17/20 4:40 AM, Roman Gushchin wrote:
> In the absolute majority of cases if a process is making a kernel
> allocation, it's memory cgroup is getting charged.
> 
> Bpf maps can be updated from an interrupt context and in such
> case there is no process which can be charged. It makes the memory
> accounting of bpf maps non-trivial.
> 
> Fortunately, after commit 4127c6504f25 ("mm: kmem: enable kernel
> memcg accounting from interrupt contexts") and b87d8cefe43c
> ("mm, memcg: rework remote charging API to support nesting")
> it's finally possible.
> 
> To do it, a pointer to the memory cgroup of the process which created
> the map is saved, and this cgroup is getting charged for all
> allocations made from an interrupt context.
> 
> Allocations made from a process context will be accounted in a usual way.
> 
> Signed-off-by: Roman Gushchin <guro@...com>
> Acked-by: Song Liu <songliubraving@...com>
[...]
>   
> +#ifdef CONFIG_MEMCG_KMEM
> +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
> +						 void *value, u64 flags)
> +{
> +	struct mem_cgroup *old_memcg;
> +	bool in_interrupt;
> +	int ret;
> +
> +	/*
> +	 * If update from an interrupt context results in a memory allocation,
> +	 * the memory cgroup to charge can't be determined from the context
> +	 * of the current task. Instead, we charge the memory cgroup, which
> +	 * contained a process created the map.
> +	 */
> +	in_interrupt = in_interrupt();
> +	if (in_interrupt)
> +		old_memcg = set_active_memcg(map->memcg);
> +
> +	ret = map->ops->map_update_elem(map, key, value, flags);
> +
> +	if (in_interrupt)
> +		set_active_memcg(old_memcg);
> +
> +	return ret;

Hmm, this approach here won't work, see also commit 09772d92cd5a ("bpf: avoid
retpoline for lookup/update/delete calls on maps") which removes the indirect
call, so the __bpf_map_update_elem() and therefore the set_active_memcg() is
not invoked for the vast majority of cases.

> +}
> +#else
> +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
> +						 void *value, u64 flags)
> +{
> +	return map->ops->map_update_elem(map, key, value, flags);
> +}
> +#endif
> +
>   BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
>   	   void *, value, u64, flags)
>   {
>   	WARN_ON_ONCE(!rcu_read_lock_held());
> -	return map->ops->map_update_elem(map, key, value, flags);
> +
> +	return __bpf_map_update_elem(map, key, value, flags);
>   }
>   
>   const struct bpf_func_proto bpf_map_update_elem_proto = {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f3fe9f53f93c..2d77fc2496da 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -31,6 +31,7 @@
>   #include <linux/poll.h>
>   #include <linux/bpf-netns.h>
>   #include <linux/rcupdate_trace.h>
> +#include <linux/memcontrol.h>
>   
>   #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
>   			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
> @@ -456,6 +457,27 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
>   		__release(&map_idr_lock);
>   }
>   
> +#ifdef CONFIG_MEMCG_KMEM
> +static void bpf_map_save_memcg(struct bpf_map *map)
> +{
> +	map->memcg = get_mem_cgroup_from_mm(current->mm);
> +}
> +
> +static void bpf_map_release_memcg(struct bpf_map *map)
> +{
> +	mem_cgroup_put(map->memcg);
> +}
> +
> +#else
> +static void bpf_map_save_memcg(struct bpf_map *map)
> +{
> +}
> +
> +static void bpf_map_release_memcg(struct bpf_map *map)
> +{
> +}
> +#endif
> +
>   /* called from workqueue */
>   static void bpf_map_free_deferred(struct work_struct *work)
>   {
> @@ -464,6 +486,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
>   
>   	bpf_map_charge_move(&mem, &map->memory);
>   	security_bpf_map_free(map);
> +	bpf_map_release_memcg(map);
>   	/* implementation dependent freeing */
>   	map->ops->map_free(map);
>   	bpf_map_charge_finish(&mem);
> @@ -875,6 +898,8 @@ static int map_create(union bpf_attr *attr)
>   	if (err)
>   		goto free_map_sec;
>   
> +	bpf_map_save_memcg(map);
> +
>   	err = bpf_map_new_fd(map, f_flags);
>   	if (err < 0) {
>   		/* failed to allocate fd.
>