netdev - Re: [PATCH v4 bpf-next 1/4] bpf: switch bpf_map ref counter to 64bit so bpf_map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4Bzagc7ZdbLGbeCfQ_9qwbXwaKztHqZN3Vfb_EpVf8gN1dg@mail.gmail.com>
Date:   Fri, 15 Nov 2019 15:27:58 -0800
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v4 bpf-next 1/4] bpf: switch bpf_map ref counter to 64bit
 so bpf_map_inc never fails

On Fri, Nov 15, 2019 at 3:24 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 11/15/19 5:02 AM, Andrii Nakryiko wrote:
> > 92117d8443bc ("bpf: fix refcnt overflow") turned refcounting of bpf_map into
> > potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit
> > (32k). Due to using 32-bit counter, it's possible in practice to overflow
> > refcounter and make it wrap around to 0, causing erroneous map free, while
> > there are still references to it, causing use-after-free problems.
>
> You mention 'it's possible in practice' to overflow in relation to bpf map
> refcount, do we need any fixing for bpf tree here?

I meant without existing 32k limit. With limit we currently have, no,
we'll just fail bpf_map_inc instead. So no need to do anything about
bpf tree.

>
> > But having a failing refcounting operations are problematic in some cases. One
> > example is mmap() interface. After establishing initial memory-mapping, user
> > is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily
> > splitting it into multiple non-contiguous regions. All this happening without
> > any control from the users of mmap subsystem. Rather mmap subsystem sends
> > notifications to original creator of memory mapping through open/close
> > callbacks, which are optionally specified during initial memory mapping
> > creation. These callbacks are used to maintain accurate refcount for bpf_map
> > (see next patch in this series). The problem is that open() callback is not
> > supposed to fail, because memory-mapped resource is set up and properly
> > referenced. This is posing a problem for using memory-mapping with BPF maps.
> >
> > One solution to this is to maintain separate refcount for just memory-mappings
> > and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively.
> > There are similar use cases in current work on tcp-bpf, necessitating extra
> > counter as well. This seems like a rather unfortunate and ugly solution that
> > doesn't scale well to various new use cases.
> >
> > Another approach to solve this is to use non-failing refcount_t type, which
> > uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX,
> > stays there. This utlimately causes memory leak, but prevents use after free.
> >

[...]

> >
> > +/* prog's refcnt limit */
> > +#define BPF_MAX_REFCNT 32768
> > +
>
> Would it make sense in this context to also convert prog refcount over to atomic64
> so we have both in one go in order to align them again? This could likely simplify
> even more wrt error paths.
>

Yes, of course, I was just trying to minimize amount of changes, but I
can go and do the same for prog refcnt.

> >   struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
> >   {
> >       if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) {