linux-kernel - RE: [PATCH 03/11] fs: add new read_uptr and write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <025de688a10d459489e8110a108fed43@AcuMS.aculab.com>
Date:   Tue, 30 Jun 2020 07:51:29 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Linus Torvalds' <torvalds@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>
CC:     Al Viro <viro@...iv.linux.org.uk>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: RE: [PATCH 03/11] fs: add new read_uptr and write_uptr file
 operations

From: Linus Torvalds
> Sent: 29 June 2020 18:03
> On Mon, Jun 29, 2020 at 8:29 AM Christoph Hellwig <hch@....de> wrote:
> >
> > So based on that I'd rather get away without our flag and tag the
> > kernel pointer case in setsockopt explicitly.
> 
> Yeah, I'd be ok to pass that kind of flag around for setsockopt, in
> ways I _don't_ want to do for some very core vfs thing like 'read()'.
> 
> That said, is there no practical limit on how big "optlen" can be?
> Sure, I realize that a lot of setsockopt users may not use all of the
> data, but let's say that "optlen" is 128, but the actual low-level
> setsockopt operation only uses the first 16 bytes, maybe we could
> always just copy the 128 bytes from user space into kernel space, and
> just say "setsockopt() always gets a kernel pointer".
> 
> Then the bpf use is even simpler. It would just pass the kernel
> pointer natively.
> 
> Because that seems to be what the BPF code really wants to do: it
> takes the user optval, and munges it into a kernel optval, and then
> (if that has been done) runs the low-level sock_setsockopt() under
> KERNEL_DS.
> 
> Couldn't we switch things around instead, and just *always* copy
> things from user space, and sock_setsockopt (and
> sock->ops->setsockopt) _always_ get a kernel buffer?
> 
> And avoid the set_fs(KERNEL_DS) games entirely that way?

I did a patch for SCTP to do the copies in the protocol wrapper.
Apart from the issue of bad applications providing overlarge
buffers and effecting a local DoS attack there were some odd issues:

1) SCTP completely abuses both setsockopt and getsockopt
   to perform additional socket operations.
   I suspect the original implementation didn't want to
   add new system calls.
2) SCTP treats getsockopt as RMW on the user buffer.
   Mostly it only needs 4 bytes, but it can in include
   a sockaddr_storage.
3) SCTP has one getsockopt that is really a setsockopt
   (ie changes things) but is implemented using getsockopt
   so that it can return a value.
4) One of the SCTP getsockopt calls has to return the
   'wrong' value to userspace (ie not the length of the
   transferred data) for compatibility with the orginal
   broken code.

I'm wondering if the [sg]etsockopt wrapper should actually
pass through a structure containing:
	Kernel buffer address (on stack if short)
	User buffer address (may be NULL)
	Length of buffer
	copy_to_user length (normally zero)
	flag: embedded pointers are user/kernel

Most code will just use the kernel buffer and return length/error.

Code that knows the supplied length is invalid can use the
user pointer - but only support direct user requests.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)