lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200826023802.GA2490802@carbon.dhcp.thefacebook.com>
Date:   Tue, 25 Aug 2020 19:38:02 -0700
From:   Roman Gushchin <guro@...com>
To:     Shakeel Butt <shakeelb@...gle.com>
CC:     <bpf@...r.kernel.org>, netdev <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>,
        LKML <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Linux MM <linux-mm@...ck.org>
Subject: Re: [PATCH bpf-next v4 03/30] bpf: memcg-based memory accounting for
 bpf maps

On Tue, Aug 25, 2020 at 04:27:09PM -0700, Shakeel Butt wrote:
> On Fri, Aug 21, 2020 at 8:01 AM Roman Gushchin <guro@...com> wrote:
> >
> > This patch enables memcg-based memory accounting for memory allocated
> > by __bpf_map_area_alloc(), which is used by most map types for
> > large allocations.
> >
> > If a map is updated from an interrupt context, and the update
> > results in memory allocation, the memory cgroup can't be determined
> > from the context of the current process. To address this case,
> > bpf map preserves a pointer to the memory cgroup of the process,
> > which created the map. This memory cgroup is charged for allocations
> > from interrupt context.
> >
> > Following patches in the series will refine the accounting for
> > some map types.
> >
> > Signed-off-by: Roman Gushchin <guro@...com>
> > ---
> >  include/linux/bpf.h  |  4 ++++
> >  kernel/bpf/helpers.c | 37 ++++++++++++++++++++++++++++++++++++-
> >  kernel/bpf/syscall.c | 27 ++++++++++++++++++++++++++-
> >  3 files changed, 66 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a9b7185a6b37..b5f178afde94 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -34,6 +34,7 @@ struct btf_type;
> >  struct exception_table_entry;
> >  struct seq_operations;
> >  struct bpf_iter_aux_info;
> > +struct mem_cgroup;
> >
> >  extern struct idr btf_idr;
> >  extern spinlock_t btf_idr_lock;
> > @@ -138,6 +139,9 @@ struct bpf_map {
> >         u32 btf_value_type_id;
> >         struct btf *btf;
> >         struct bpf_map_memory memory;
> > +#ifdef CONFIG_MEMCG_KMEM
> > +       struct mem_cgroup *memcg;
> > +#endif
> >         char name[BPF_OBJ_NAME_LEN];
> >         u32 btf_vmlinux_value_type_id;
> >         bool bypass_spec_v1;
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index be43ab3e619f..f8ce7bc7003f 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/jiffies.h>
> >  #include <linux/pid_namespace.h>
> >  #include <linux/proc_ns.h>
> > +#include <linux/sched/mm.h>
> >
> >  #include "../../lib/kstrtox.h"
> >
> > @@ -41,11 +42,45 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
> >         .arg2_type      = ARG_PTR_TO_MAP_KEY,
> >  };
> >
> > +#ifdef CONFIG_MEMCG_KMEM
> > +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
> > +                                                void *value, u64 flags)
> > +{
> > +       struct mem_cgroup *old_memcg;
> > +       bool in_interrupt;
> > +       int ret;
> > +
> > +       /*
> > +        * If update from an interrupt context results in a memory allocation,
> > +        * the memory cgroup to charge can't be determined from the context
> > +        * of the current task. Instead, we charge the memory cgroup, which
> > +        * contained a process created the map.
> > +        */
> > +       in_interrupt = in_interrupt();
> > +       if (in_interrupt)
> > +               old_memcg = memalloc_use_memcg(map->memcg);
> > +
> 
> The memcg_kmem_bypass() will bypass all __GFP_ACCOUNT allocations even
> before looking at current->active_memcg, so, this patch will be a
> noop.

Good point. Looks like it's a good example of kmem accounting from an interrupt
context, which we've discussed on the Plumbers session.

It means we need some more work on the mm side.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ