[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200613035038.qmoxtf5mn3g3aiqe@ast-mbp>
Date: Fri, 12 Jun 2020 20:50:38 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
David Laight <David.Laight@...lab.com>
Subject: Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from
{get,set}sockopt when optlen > PAGE_SIZE
On Fri, Jun 12, 2020 at 06:03:03PM -0700, Stanislav Fomichev wrote:
> On Fri, Jun 12, 2020 at 5:34 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> > > Attaching to these hooks can break iptables because its optval is
> > > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > > David also mentioned some SCTP options can be big (around 256k).
> > >
> > > There are two possible ways to fix it:
> > > 1. Increase the limit to match iptables max optval. There is, however,
> > > no clear upper limit. Technically, iptables can accept up to
> > > 512M of data (not sure how practical it is though).
> > >
> > > 2. Bypass the value (don't expose to BPF) if it's too big and trigger
> > > BPF only with level/optname so BPF can still decide whether
> > > to allow/deny big sockopts.
> > >
> > > The initial attempt was implemented using strategy #1. Due to
> > > listed shortcomings, let's switch to strategy #2. When there is
> > > legitimate a real use-case for iptables/SCTP, we can consider increasing
> > > the PAGE_SIZE limit.
> > >
> > > v3:
> > > * don't increase the limit, bypass the argument
> > >
> > > v2:
> > > * proper comments formatting (Jakub Kicinski)
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > Cc: David Laight <David.Laight@...LAB.COM>
> > > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > > ---
> > > kernel/bpf/cgroup.c | 18 ++++++++++++++----
> > > 1 file changed, 14 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index fdf7836750a3..758082853086 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> > >
> > > static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
> > > {
> > > - if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > > + if (unlikely(max_optlen < 0))
> > > return -EINVAL;
> > >
> > > + if (unlikely(max_optlen > PAGE_SIZE)) {
> > > + /* We don't expose optvals that are greater than PAGE_SIZE
> > > + * to the BPF program.
> > > + */
> > > + ctx->optval = NULL;
> > > + ctx->optval_end = NULL;
> > > + return 0;
> > > + }
> >
> > It's probably ok, but makes me uneasy about verifier consequences.
> > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > I don't think we have such tests. I guess bpf prog won't be able to read
> > anything and nothing will crash, but having PTR_TO_PACKET that is
> > actually NULL would be an odd special case to keep in mind for everyone
> > who will work on the verifier from now on.
> >
> > Also consider bpf prog that simply reads something small like 4 bytes.
> > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > It will have
> > p = ctx->optval;
> > if (p + 4 > ctx->optval_end)
> > /* goto out and don't bother logging, since that never happens */
> > *(u32*)p;
> >
> > but 'clever' user space would pass long optlen and prog suddenly
> > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > surprising.
> >
> > I feel it's better to copy the first 4k and let the program see it.
> Agreed with the IP_FREEBIND example wrt observability, however it's
> not clear what to do with the cropped buffer if the bpf program
> modifies it.
>
> Consider that huge iptables setsockopts where the usespace passes
> PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> Now, if the bpf program modifies the buffer (say, flips some byte), we
> are back to square one. We either have to silently discard that buffer
> or reallocate/merge. My reasoning with data == NULL, is that at least
> there is a clear signal that the program can't access the data (and
> can look at optlen to see if the original buffer is indeed non-zero
> and maybe deny such requests?).
> At this point I'm really starting to think that maybe we should just
> vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> upper bound :-/
> I'll try to think about this a bit more over the weekend.
Yeah. Tough choices.
We can also detect in the verifier whether program accessed ctx->optval
and skip alloc/copy if program didn't touch it, but I suspect in most
case the program would want to read it.
I think vmallocing what optlen said is DoS-able. It's better to
stick with single page.
Let's keep brainstorming.
Powered by blists - more mailing lists