netdev - Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKH8qBsak9E45d-znh9-4NQ4aSS_qta1_v0wEv-to=8Rc1-R0Q@mail.gmail.com>
Date:   Tue, 16 Jun 2020 16:20:01 -0700
From:   Stanislav Fomichev <sdf@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Laight <David.Laight@...lab.com>
Subject: Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt
 when optlen > PAGE_SIZE

On Tue, Jun 16, 2020 at 4:04 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Mon, Jun 15, 2020 at 09:41:38AM -0700, Stanislav Fomichev wrote:
> > On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > [ .. ]
> > > > > It's probably ok, but makes me uneasy about verifier consequences.
> > > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > > > > I don't think we have such tests. I guess bpf prog won't be able to read
> > > > > anything and nothing will crash, but having PTR_TO_PACKET that is
> > > > > actually NULL would be an odd special case to keep in mind for everyone
> > > > > who will work on the verifier from now on.
> > > > >
> > > > > Also consider bpf prog that simply reads something small like 4 bytes.
> > > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > > > > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > > > > It will have
> > > > > p = ctx->optval;
> > > > > if (p + 4 > ctx->optval_end)
> > > > >  /* goto out and don't bother logging, since that never happens */
> > > > > *(u32*)p;
> > > > >
> > > > > but 'clever' user space would pass long optlen and prog suddenly
> > > > > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > > > > surprising.
> > > > >
> > > > > I feel it's better to copy the first 4k and let the program see it.
> > > > Agreed with the IP_FREEBIND example wrt observability, however it's
> > > > not clear what to do with the cropped buffer if the bpf program
> > > > modifies it.
> > > >
> > > > Consider that huge iptables setsockopts where the usespace passes
> > > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> > > > Now, if the bpf program modifies the buffer (say, flips some byte), we
> > > > are back to square one. We either have to silently discard that buffer
> > > > or reallocate/merge. My reasoning with data == NULL, is that at least
> > > > there is a clear signal that the program can't access the data (and
> > > > can look at optlen to see if the original buffer is indeed non-zero
> > > > and maybe deny such requests?).
> > > > At this point I'm really starting to think that maybe we should just
> > > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> > > > upper bound :-/
> > > > I'll try to think about this a bit more over the weekend.
> > >
> > > Yeah. Tough choices.
> > > We can also detect in the verifier whether program accessed ctx->optval
> > > and skip alloc/copy if program didn't touch it, but I suspect in most
> > > case the program would want to read it.
> > > I think vmallocing what optlen said is DoS-able. It's better to
> > > stick with single page.
> > > Let's keep brainstorming.
> > Btw, can we use sleepable bpf for that? As in, do whatever I suggested
> > in these patches (don't expose optval>PAGE_SIZE via context), but add
> > a new helper where you can say 'copy x bytes from y offset of the
> > original optval' (the helper will do sleepable copy_form_user).
> > That way we have a clean signal to the BPF that the value is too big
> > (optval==optval_end==NULL) and the user can fallback to the helper to
> > inspect the value. We can also provide another helper to export new
> > value for this case.
>
> sleepable will be read-only and with toctou.
> I guess this patch is the least evil then ?
> But I'm confused with the test in patch 2.
> Why does it do 'if (optval > optval_end)' ?
> How is that possible when patch 1 makes them equal.
Right, it should really be 'optval < optval_end', good point!
I want to make sure in the BPF program that we can't access the buffer
(essentially return EPERM to the userpace so the test fails).
Will fix in a follow up.

> may be another idea:
> allocate 4k, copy first 4k into it, but keep ctx.optlen as original.
> if bpf prog reads optval and finds it ok in setsockopt,
> it can set ctx.optlen = 0
> which would mean run the rest of setsockopt handling with original
> '__user *optval' and ignore the buffer that was passed to bpf prog.
> In case of ctx.optlen < 4k the behavior won't change from bpf prog
> and from kernel pov.
> When ctx.optlen > 4k and prog didn't adjust it
> __cgroup_bpf_run_filter_setsockopt will do ret = -EFAULT;
> and reject sockopt.
> So bpf prog would be force to do something with large optvals.
> yet it will be able to examine first 4k bytes of it.
> It's a bit of change in behavior, but I don't think bpf prog is
> doing ctx.optlen = 0, since that's more or less the same as
> doing ctx.optlen = -2.
> or may be use some other indicator ?
> And do something similar with getsockopt.
Good suggestion! That sounds doable in theory, let me try and see if I
hit something unexpected.
I suppose trimming provided optval to zero (optlen=0) is not something
that sensible programs would do?

In this case please ignore the v4 I posted recently!