lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjxQczqZ96esvDrH5QZsLg6azXCGDgo+Bmm6r8t2ssasg@mail.gmail.com>
Date:   Sat, 27 Jun 2020 09:33:03 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     David Laight <David.Laight@...lab.com>
Cc:     Christoph Hellwig <hch@....de>, Al Viro <viro@...iv.linux.org.uk>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations

On Sat, Jun 27, 2020 at 3:49 AM David Laight <David.Laight@...lab.com> wrote:
>
> > Just keep the existing "set_fs()". It's not harmful if it's only used
> > occasionally. We should rename it once it's rare enough, though.
>
> Am I right in thinking that it just sets a flag in 'current' ?

Basically, yes. That's what it has always done.

Well "always" is not true - it used to set the %fs segment register
originally (thus the name), but _conceptually_ it sets a flag for
"should user accesses be kernel accesses instead".

On x86 - and most other architectures where user space and kernel
space are in the same address space and accessed with the same
instructions, that has then been implemented as just a "what is the
limit for an access".

On other architectures - architectures that need different access
methods (or different flags to the load/store instruction) - it's an
actual flag that changes which access method you use.

> Although I don't remember access_ok() doing a suitable check
> (would need to be (address - base) < limit).

So again, on the architectures with a unified address space,
access_ok() is exactly that "address + access_size <= limit", although
often done with some inline asm just to get the overflow case done
efficiently.

On other architectures, there's no limit check, because _all_
addresses are either user space or kernel space addresses, and what
changes isn't the address limit, but the access itself.

So what I was suggesting is literally

 - keep this flag around as a flag

 - but make all _normal_ user accesses ignore it, and always do user
accesses (so on a unified address space architecture like x86 it
always checks the _fixed_ limit, and on something like sparc32 which
has separate kernel and user address spaces, it just always does a
user access with no conditionals at all)

 - then make the really odd and hopefully very rare cases check that
flag explicitly and manually, and do

        if (current->legacy_uptr_is_kernel)
                memcpy(...);
        else
                copy_to/from_user(...);

and my hope is that we'd have only a handful of cases (like the
setsockopt thing: one for each protocol or whatever) that actually
want this.

Note that the legacy behavior would still remain in architectures that
haven't been modified to remove the use of set_fs(), so I would
further suggest that the two approaches live side-by-side for at least
a while. But _generic_ code (and with Christoph's patches at least
x86) would make set_fs() cause a build error.

So we'd have a new

     set_force_kernel_pointers();
     ....
     clear_force_kernel_pointers();

that would set/clear that 'current->legacy_uptr_is_kernel' variable,
and we'd have a handful of places that would check it.

The naming above is all random, and I'm not claiming that any of this
is particularly _clean_. I'm also not claiming that it's really any
better than our current "set_fs()" mess conceptually.

The only thing that makes it better than our current "set_fs()" is

 - there would hopefully be very few cases of this

 - it would *not* affect random incidental user accesses that just
happen to be in the shadow of this thing.

That second point is the important one, I feel. The real problem with
"set_fs()" has been that we've occasionally had bugs where we ended up
running odd paths that we really didn't _intend_ to run with kernel
pointers. The classic example is the SCSI "write as ioctl" example,
where a write to a SCSI generic device would do various odd things and
follow pointers and what-not. Then you get into real trouble when
"splice()" ends up truiong to write a kernel buffer, and because of
"set_fs()" suddenly the sg code started accessing kernel memory
willy-nilly.

So my suggestion was basically a new version of set_fs(), but one that
is just much more targeted, and doesn't affect all random user
accesses, only those very special ones that are then very *explicitly*
aware of the fact that "hey, I might be called in this situation where
I'm going to get a kernel address instead".

> If that is needed (I presume it was added for a purpose) then all
> the socket option code needs to be able to handle kernel buffers.

So that's very much what I'd like to avoid.

The plan would be that all the *normal* stuff would be handled by
either (a) always having the data come from user space, or (b) the
data has a known size (either fixed, or "optlen" or whatever) and then
being copied to a kernel buffer and then always handled as a kernel
field that bpf can then call with kernel data.

I thought there was just one very specific case of "oh, in certain
cases of setsockopt we don't know what size this address is and optlen
is ignored", so we have to just pass the pointer down to the protocol,
which is the point that knows how much of an address it wants..

Was that a misunderstanding on my part?

Because if there are tons and tons of places that want this "either
kernel or user" then we could still have a helper function for it, but
it means that the whole "limit the cases" advantage to some degree
goes away.

It would still fix the 99% of normal "copy/from/to_user()" cases,
though. They'd be fixed and "safe" and coule never ever touch kernel
memory even if there was some confusion about things. So it would be
an improvement, but I was really hoping that the cases where there can
be confusion would be pretty rare.

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ