lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200616230355.hzipb7hly3fo5moc@ast-mbp.dhcp.thefacebook.com>
Date:   Tue, 16 Jun 2020 16:03:55 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Laight <David.Laight@...lab.com>
Subject: Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from
 {get,set}sockopt when optlen > PAGE_SIZE

On Mon, Jun 15, 2020 at 09:41:38AM -0700, Stanislav Fomichev wrote:
> On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> [ .. ]
> > > > It's probably ok, but makes me uneasy about verifier consequences.
> > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > > > I don't think we have such tests. I guess bpf prog won't be able to read
> > > > anything and nothing will crash, but having PTR_TO_PACKET that is
> > > > actually NULL would be an odd special case to keep in mind for everyone
> > > > who will work on the verifier from now on.
> > > >
> > > > Also consider bpf prog that simply reads something small like 4 bytes.
> > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > > > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > > > It will have
> > > > p = ctx->optval;
> > > > if (p + 4 > ctx->optval_end)
> > > >  /* goto out and don't bother logging, since that never happens */
> > > > *(u32*)p;
> > > >
> > > > but 'clever' user space would pass long optlen and prog suddenly
> > > > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > > > surprising.
> > > >
> > > > I feel it's better to copy the first 4k and let the program see it.
> > > Agreed with the IP_FREEBIND example wrt observability, however it's
> > > not clear what to do with the cropped buffer if the bpf program
> > > modifies it.
> > >
> > > Consider that huge iptables setsockopts where the usespace passes
> > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> > > Now, if the bpf program modifies the buffer (say, flips some byte), we
> > > are back to square one. We either have to silently discard that buffer
> > > or reallocate/merge. My reasoning with data == NULL, is that at least
> > > there is a clear signal that the program can't access the data (and
> > > can look at optlen to see if the original buffer is indeed non-zero
> > > and maybe deny such requests?).
> > > At this point I'm really starting to think that maybe we should just
> > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> > > upper bound :-/
> > > I'll try to think about this a bit more over the weekend.
> >
> > Yeah. Tough choices.
> > We can also detect in the verifier whether program accessed ctx->optval
> > and skip alloc/copy if program didn't touch it, but I suspect in most
> > case the program would want to read it.
> > I think vmallocing what optlen said is DoS-able. It's better to
> > stick with single page.
> > Let's keep brainstorming.
> Btw, can we use sleepable bpf for that? As in, do whatever I suggested
> in these patches (don't expose optval>PAGE_SIZE via context), but add
> a new helper where you can say 'copy x bytes from y offset of the
> original optval' (the helper will do sleepable copy_form_user).
> That way we have a clean signal to the BPF that the value is too big
> (optval==optval_end==NULL) and the user can fallback to the helper to
> inspect the value. We can also provide another helper to export new
> value for this case.

sleepable will be read-only and with toctou.
I guess this patch is the least evil then ?
But I'm confused with the test in patch 2.
Why does it do 'if (optval > optval_end)' ?
How is that possible when patch 1 makes them equal.
may be another idea:
allocate 4k, copy first 4k into it, but keep ctx.optlen as original.
if bpf prog reads optval and finds it ok in setsockopt,
it can set ctx.optlen = 0
which would mean run the rest of setsockopt handling with original
'__user *optval' and ignore the buffer that was passed to bpf prog.
In case of ctx.optlen < 4k the behavior won't change from bpf prog
and from kernel pov.
When ctx.optlen > 4k and prog didn't adjust it
__cgroup_bpf_run_filter_setsockopt will do ret = -EFAULT;
and reject sockopt.
So bpf prog would be force to do something with large optvals.
yet it will be able to examine first 4k bytes of it.
It's a bit of change in behavior, but I don't think bpf prog is
doing ctx.optlen = 0, since that's more or less the same as
doing ctx.optlen = -2.
or may be use some other indicator ?
And do something similar with getsockopt.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ