[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9cdf860d-8370-95b5-1688-af03265cc874@fb.com>
Date: Fri, 25 Mar 2022 12:16:27 -0700
From: Yonghong Song <yhs@...com>
To: Hao Luo <haoluo@...gle.com>, Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Cc: KP Singh <kpsingh@...nel.org>, Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC bpf-next 0/2] Mmapable task local storage.
On 3/24/22 4:41 PM, Hao Luo wrote:
> Some map types support mmap operation, which allows userspace to
> communicate with BPF programs directly. Currently only arraymap
> and ringbuf have mmap implemented.
>
> However, in some use cases, when multiple program instances can
> run concurrently, global mmapable memory can cause race. In that
> case, userspace needs to provide necessary synchronizations to
> coordinate the usage of mapped global data. This can be a source
> of bottleneck.
I can see your use case here. Each calling process can get the
corresponding bpf program task local storage data through
mmap interface. As you mentioned, there is a tradeoff
between more memory vs. non-global synchronization.
I am thinking that another bpf_iter approach can retrieve
the similar result. We could implement a bpf_iter
for task local storage map, optionally it can provide
a tid to retrieve the data for that particular tid.
This way, user space needs an explicit syscall, but
does not need to allocate more memory than necessary.
WDYT?
>
> It would be great to have a mmapable local storage in that case.
> This patch adds that.
>
> Mmap isn't BPF syscall, so unpriv users can also use it to
> interact with maps.
>
> Currently the only way of allocating mmapable map area is using
> vmalloc() and it's only used at map allocation time. Vmalloc()
> may sleep, therefore it's not suitable for maps that may allocate
> memory in an atomic context such as local storage. Local storage
> uses kmalloc() with GFP_ATOMIC, which doesn't sleep. This patch
> uses kmalloc() with GFP_ATOMIC as well for mmapable map area.
>
> Allocating mmapable memory has requirment on page alignment. So we
> have to deliberately allocate more memory than necessary to obtain
> an address that has sdata->data aligned at page boundary. The
> calculations for mmapable allocation size, and the actual
> allocation/deallocation are packaged in three functions:
>
> - bpf_map_mmapable_alloc_size()
> - bpf_map_mmapable_kzalloc()
> - bpf_map_mmapable_kfree()
>
> BPF local storage uses them to provide generic mmap API:
>
> - bpf_local_storage_mmap()
>
> And task local storage adds the mmap callback:
>
> - task_storage_map_mmap()
>
> When application calls mmap on a task local storage, it gets its
> own local storage.
>
> Overall, mmapable local storage trades off memory with flexibility
> and efficiency. It brings memory fragmentation but can make programs
> stateless. Therefore useful in some cases.
>
> Hao Luo (2):
> bpf: Mmapable local storage.
> selftests/bpf: Test mmapable task local storage.
>
> include/linux/bpf.h | 4 +
> include/linux/bpf_local_storage.h | 5 +-
> kernel/bpf/bpf_local_storage.c | 73 +++++++++++++++++--
> kernel/bpf/bpf_task_storage.c | 40 ++++++++++
> kernel/bpf/syscall.c | 67 +++++++++++++++++
> .../bpf/prog_tests/task_local_storage.c | 38 ++++++++++
> .../bpf/progs/task_local_storage_mmapable.c | 38 ++++++++++
> 7 files changed, 257 insertions(+), 8 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage_mmapable.c
>
Powered by blists - more mailing lists