lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220319173036.23352-1-laoar.shao@gmail.com>
Date:   Sat, 19 Mar 2022 17:30:22 +0000
From:   Yafang Shao <laoar.shao@...il.com>
To:     roman.gushchin@...ux.dev, ast@...nel.org, daniel@...earbox.net,
        andrii@...nel.org, kafai@...com, songliubraving@...com, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, shuah@...nel.org
Cc:     netdev@...r.kernel.org, bpf@...r.kernel.org,
        linux-kselftest@...r.kernel.org, Yafang Shao <laoar.shao@...il.com>
Subject: [PATCH 00/14] bpf: Allow not to charge bpf memory 

After switching to memcg-based bpf memory accounting, the bpf memory is
charged to the loader's memcg by defaut, that causes unexpected issues for
us. For instance, the container of the loader-which loads the bpf programs
and pins them on bpffs-may restart after pinning the progs and maps. After
the restart, the pinned progs and maps won't belong to the new container
any more, while they actually belong to an offline memcg left by the
previous generation. That inconsistent behavior will make trouble for the
memory resource management for this container. 

The reason why these progs and maps have to be persistent across multiple
generations is that these progs and maps are also used by other processes
which are not in this container. IOW, they can't be removed when this
container is restarted. Take a specific example, bpf program for clsact
qdisc is loaded by a agent running in a container, which not only loads
bpf program but also processes the data generated by this program and do
some other maintainace things.

In order to keep the charging behavior consistent, we used to consider a
way to recharge these pinned maps and progs again after the container is
restarted, but after the discussion[1] with Roman, we decided to go
another direction that don't charge them to the container in the first
place. TL;DR about the mentioned disccussion: recharging is not a generic
solution and it may take too much risk.

This patchset is the solution of no charge. Two flags are introduced in
union bpf_attr, one for bpf map and another for bpf prog. The user who
doesn't want to charge to current memcg can use these two flags. These two
flags are only permitted for sys admin as these memory will be accounted to
the root memcg only.

Patches #1~#8 are for bpf map. Patches #9~#12 are for bpf prog. Patch #13
and #14 are for selftests and also the examples of how to use them.

[1]. https://lwn.net/Articles/887180/ 

Yafang Shao (14):
  bpf: Introduce no charge flag for bpf map
  bpf: Only sys admin can set no charge flag
  bpf: Enable no charge in map _CREATE_FLAG_MASK
  bpf: Introduce new parameter bpf_attr in bpf_map_area_alloc
  bpf: Allow no charge in bpf_map_area_alloc
  bpf: Allow no charge for allocation not at map creation time
  bpf: Allow no charge in map specific allocation
  bpf: Aggregate flags for BPF_PROG_LOAD command
  bpf: Add no charge flag for bpf prog
  bpf: Only sys admin can set no charge flag for bpf prog
  bpf: Set __GFP_ACCOUNT at the callsite of bpf_prog_alloc
  bpf: Allow no charge for bpf prog
  bpf: selftests: Add test case for BPF_F_NO_CHARTE
  bpf: selftests: Add test case for BPF_F_PROG_NO_CHARGE

 include/linux/bpf.h                           | 27 ++++++-
 include/uapi/linux/bpf.h                      | 21 +++--
 kernel/bpf/arraymap.c                         |  9 +--
 kernel/bpf/bloom_filter.c                     |  7 +-
 kernel/bpf/bpf_local_storage.c                |  8 +-
 kernel/bpf/bpf_struct_ops.c                   | 13 +--
 kernel/bpf/core.c                             | 20 +++--
 kernel/bpf/cpumap.c                           | 10 ++-
 kernel/bpf/devmap.c                           | 14 ++--
 kernel/bpf/hashtab.c                          | 14 ++--
 kernel/bpf/local_storage.c                    |  4 +-
 kernel/bpf/lpm_trie.c                         |  4 +-
 kernel/bpf/queue_stack_maps.c                 |  5 +-
 kernel/bpf/reuseport_array.c                  |  3 +-
 kernel/bpf/ringbuf.c                          | 19 ++---
 kernel/bpf/stackmap.c                         | 13 +--
 kernel/bpf/syscall.c                          | 40 +++++++---
 kernel/bpf/verifier.c                         |  2 +-
 net/core/filter.c                             |  6 +-
 net/core/sock_map.c                           |  8 +-
 net/xdp/xskmap.c                              |  9 ++-
 tools/include/uapi/linux/bpf.h                | 21 +++--
 .../selftests/bpf/map_tests/no_charg.c        | 79 +++++++++++++++++++
 .../selftests/bpf/prog_tests/no_charge.c      | 49 ++++++++++++
 24 files changed, 297 insertions(+), 108 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/map_tests/no_charg.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/no_charge.c

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ