lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 16 Nov 2019 01:13:54 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Alexei Starovoitov <ast@...com>, Andrii Nakryiko <andriin@...com>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc:     "andrii.nakryiko@...il.com" <andrii.nakryiko@...il.com>,
        Kernel Team <Kernel-team@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH v4 bpf-next 2/4] bpf: add mmap() support for
 BPF_MAP_TYPE_ARRAY

On 11/16/19 12:47 AM, Alexei Starovoitov wrote:
> On 11/15/19 3:44 PM, Daniel Borkmann wrote:
>> On 11/16/19 12:37 AM, Alexei Starovoitov wrote:
>>> On 11/15/19 3:31 PM, Daniel Borkmann wrote:
>>>> On 11/15/19 5:02 AM, Andrii Nakryiko wrote:
>>>>> Add ability to memory-map contents of BPF array map. This is extremely
>>>>> useful
>>>>> for working with BPF global data from userspace programs. It allows to
>>>>> avoid
>>>>> typical bpf_map_{lookup,update}_elem operations, improving both
>>>>> performance
>>>>> and usability.
>>>>>
>>>>> There had to be special considerations for map freezing, to avoid
>>>>> having
>>>>> writable memory view into a frozen map. To solve this issue, map
>>>>> freezing and
>>>>> mmap-ing is happening under mutex now:
>>>>>      - if map is already frozen, no writable mapping is allowed;
>>>>>      - if map has writable memory mappings active (accounted in
>>>>> map->writecnt),
>>>>>        map freezing will keep failing with -EBUSY;
>>>>>      - once number of writable memory mappings drops to zero, map
>>>>> freezing can be
>>>>>        performed again.
>>>>>
>>>>> Only non-per-CPU plain arrays are supported right now. Maps with
>>>>> spinlocks
>>>>> can't be memory mapped either.
>>>>>
>>>>> For BPF_F_MMAPABLE array, memory allocation has to be done through
>>>>> vmalloc()
>>>>> to be mmap()'able. We also need to make sure that array data memory is
>>>>> page-sized and page-aligned, so we over-allocate memory in such a way
>>>>> that
>>>>> struct bpf_array is at the end of a single page of memory with
>>>>> array->value
>>>>> being aligned with the start of the second page. On deallocation we
>>>>> need to
>>>>> accomodate this memory arrangement to free vmalloc()'ed memory
>>>>> correctly.
>>>>>
>>>>> One important consideration regarding how memory-mapping subsystem
>>>>> functions.
>>>>> Memory-mapping subsystem provides few optional callbacks, among them
>>>>> open()
>>>>> and close().  close() is called for each memory region that is
>>>>> unmapped, so
>>>>> that users can decrease their reference counters and free up
>>>>> resources, if
>>>>> necessary. open() is *almost* symmetrical: it's called for each memory
>>>>> region
>>>>> that is being mapped, **except** the very first one. So bpf_map_mmap
>>>>> does
>>>>> initial refcnt bump, while open() will do any extra ones after that.
>>>>> Thus
>>>>> number of close() calls is equal to number of open() calls plus one
>>>>> more.
>>>>>
>>>>> Cc: Johannes Weiner <hannes@...xchg.org>
>>>>> Cc: Rik van Riel <riel@...riel.com>
>>>>> Acked-by: Song Liu <songliubraving@...com>
>>>>> Acked-by: John Fastabend <john.fastabend@...il.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@...com>
>>>>
>>>> [...]
>>>>> +/* called for any extra memory-mapped regions (except initial) */
>>>>> +static void bpf_map_mmap_open(struct vm_area_struct *vma)
>>>>> +{
>>>>> +    struct bpf_map *map = vma->vm_file->private_data;
>>>>> +
>>>>> +    bpf_map_inc(map);
>>>>
>>>> This would also need to inc uref counter since it's technically a
>>>> reference
>>>> of this map into user space as otherwise if map->ops->map_release_uref
>>>> would
>>>> be used for maps supporting mmap, then the callback would trigger even
>>>> if user
>>>> space still has a reference to it.
>>>
>>> I thought we use uref only for array that can hold FDs ?
>>> That's why I suggested Andrii earlier to drop uref++.
>>
>> Yeah, only for fd array currently. Question is, if we ever reuse that
>> map_release_uref
>> callback in future for something else, will we remember that we earlier
>> missed to add
>> it here? :/
> 
> What do you mean 'missed to add' ?

Was saying missed to add the inc/put for the uref counter.

> This is mmap path. Anything that needs releasing (like FDs for
> prog_array or progs for sockmap) cannot be mmap-able.

Right, I meant if in future we ever have another use case outside of it
for some reason (unrelated to those maps you mention above). Can we
guarantee this is never going to happen? Seemed less fragile at least to
maintain proper count here.

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ