netdev - Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOFY-A1vTnr9TwZZ8a4WYSQqGLfPkb6v_W5bEMY=MWh27=c0Dw@mail.gmail.com>
Date:   Wed, 10 Feb 2021 11:23:58 -0800
From:   Arjun Roy <arjunroy@...gle.com>
To:     David Ahern <dsahern@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Leon Romanovsky <leon@...nel.org>,
        Arjun Roy <arjunroy.kdev@...il.com>,
        David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [net-next v2] tcp: Explicitly mark reserved field in
 tcp_zerocopy_receive args.

On Tue, Feb 9, 2021 at 8:35 PM David Ahern <dsahern@...il.com> wrote:
>
> On 2/9/21 4:46 PM, Arjun Roy wrote:
> > On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >>
> >> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
> >>> On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> >>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> >>>>> That would be the case for new userspace on old kernel. Extending the
> >>>>> check to the end of the struct would guarantee new userspace can not ask
> >>>>> for something that the running kernel does not understand.
> >>>>
> >>>> Indeed, so we're agreeing that check_zeroed_user() is needed before
> >>>> original optlen from user space gets truncated?
> >>>
> >>> I thought so, but maybe not. To think through this ...
> >>>
> >>> If current kernel understands a struct of size N, it can only copy that
> >>> amount from user to kernel. Anything beyond is ignored in these
> >>> multiplexed uAPIs, and that is where the new userspace on old kernel falls.
> >>>
> >>> Known value checks can only be done up to size N. In this case, the
> >>> reserved field is at the end of the known struct size, so checking just
> >>> the field is fine. Going beyond the reserved field has implications for
> >>> extensions to the API which should be handled when those extensions are
> >>> added.
> >>
> >> Let me try one last time.
> >>
> >> There is no check in the kernels that len <= N. User can pass any
> >> length _already_. check_zeroed_user() forces the values beyond the
> >> structure length to be known (0) rather than anything. It can only
> >> avoid breakages in the future.
> >>
> >>> So, in short I think the "if (zc.reserved)" is correct as Leon noted.
> >>
> >> If it's correct to check some arbitrary part of the buffer is zeroed
> >> it should be correct to check the entire tail is zeroed.
> >
> > So, coming back to the thread, I think the following appears to be the
> > current thoughts:
> >
> > 1. It is requested that, on the kernel as it stands today, fields
> > beyond zc.msg_flags (including zc.reserved, the only such field as of
> > this patch) are zero'd out. So a new userspace asking to do specific
> > things would fail on this old kernel with EINVAL. Old userspace would
> > work on old or new kernels. New of course works on new kernels.
> > 2. If it's correct to check some arbitrary field (zc.reserved) to be
> > 0, then it should be fine to check this for all future fields >=
> > reserved in the struct. So some advanced userspace down the line
> > doesn't get confused.
> >
> > Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes
> > struct right now, suppose userspace of the future gives us 96 bytes of
> > which the last 32 are non-zero for some feature or the other. We, in
> > the here and now kernel, truncate that length to 64 (as in we only
> > copy to kernel those first 64 bytes) and set the returned length to
> > 64. The understanding being, any (future, past or present) userspace
> > consults the output value; and considers anything byte >= the returned
> > len to be untouched by the kernel executing the call (ie. garbage,
> > unacted upon).
> >
> > So, how would this work for old+new userspace on old+new kernel?
> >
> > A) old+old, new+new: sizes match, no issue
> > B) new kernel, old userspace: That's not an issue. We have the
> > switch(len) statement for that.
> > C) old kernel, new userspace: that's the 96 vs. 64 B example above -
> > new userspace would see that the kernel only operated on 64 B and
> > treat the last 32 B as garbage/unacted on.
> >
> > In this case, we would not give EINVAL on case C, as we would if we
> > returned EINVAL on a check_zeroed_user() case for fields past
> > zc.reserved. We'd do a zerocopy operating on just the features we know
> > about, and communicate to the user that we only acted on features up
> > until this byte offset.
> >
> > Now, given this is the case, we still have the padding confusion with
> > zc.reserved and the current struct size, so we have to force it to 0
> > as we are doing. But I think we don't need to go beyond this so far.
> >
> > Thus, my personal preference is to not have the check_zeroed_user()
> > check. But if the consensus demands it, then it's an easy enough fix.
> > What are your thoughts?
> >
>
> bpf uses check_zeroed_user to make sure extensions to its structs are
> compatible, so yes, this is required.

Very well; I shall send out a v3 patch with this.

>
> Also, you need to address legitimate msg_flags as I mentioned in another
> response.

I meant to respond to this earlier but forgot. v3 will address this as well.

-Arjun