linux-kernel - Re: [PATCH] fs: use KERNEL_DS instead of get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Fri, 8 Mar 2019 08:20:17 -0800
From:   Christoph Hellwig <hch@...radead.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jann Horn <jannh@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs: use KERNEL_DS instead of get_ds()

On Fri, Mar 08, 2019 at 02:23:31PM +0000, Al Viro wrote:
> You do realize that nested pairs of that sort are not all there is?
> Even leaving m68k aside (there the same registers that select
> userland or kernel for that kind of access can be used e.g. for
> writeback control, or to switch to accessing sun3 MMU tables, etc.)

Yes.  And the whole point is to keep these uses clear and separate.

> there are
> 	* temporary switches to USER_DS in things like unaligned
> access handlers, etc., where the kernel is doing emulation of possibly
> userland insns; similar for oops code dumping, etc.
> 	* use_mm()/unuse_mm() should probably switch to USER_DS and
> back, rather than doing that in callers.
> 	* switch to USER_DS (and no, it's *not* "USER_DS unless we started
> with KERNEL_DS" - nested counter is no-go here) for perf callbacks.
> 	* regular non-paired switches to USER_DS: do_exit() and
> flush_old_exec().

And that is probably the close to full list of callers that want
to explicitly enable access to the user address space, and thus
mark the thread as a user thread (and occasionally clear that in e.g.
unuse_mm).

Unless I'm completely missing something our general rule of thumb
should be:

 - threads are started with uaccess kernel turned on (count = 1)
 - if we execute in userspace we switch to user uaccess (count = 0)
   - same for use_mm style threads that want user access
 - every current random kernel code override increments the refcount
   and drops the reference when done
 - force uaccess cases like do_exit or the validation check on
   return to userspace force it back to 0.

Initially each 1 > 0 transition (decrement or force) will do
set_fs(USER_DS), each 0 > 1 transition will do set_fs(KERNEL_DS).

Then later architectures can kill the set_fs API, and potentially
optimize things by getting rid of the addr_limit field in its current
form.