linux-kernel - Re: what's papered over by set_fs(USER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTindqGfToD=xjR_iQHMX+P0qz6bFttZ3RY4MvqOw@mail.gmail.com>
Date:	Sat, 25 Sep 2010 05:54:06 -0400
From:	Brian Gerst <brgerst@...il.com>
To:	Al Viro <viro@...iv.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>, tglx@...utronix.de,
	mingo@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?

On Sat, Sep 25, 2010 at 1:20 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Fri, Sep 24, 2010 at 11:51:11PM -0400, Brian Gerst wrote:
>> > Again, I agree that it almost certainly can be dropped. ??I really wonder
>> > about the history, though. ??It predates git and bk by far (late 1996).
>> > Linus, do you have any recollection regarding that stuff?
>> >
>>
>> In the beginning, the i386 kernel used a non-flat segmented memory
>> layout.  USER_[CD]S were 3GB segments at base 0, and KERNEL_[CD]S were
>> 1GB segments at base 3GB.  This meant that the kernel could not access
>> userspace addresses without using a fs segment override (%fs was saved
>> in pt_regs, reloaded with USER_DS on kernel entry, and restored on
>> kernel exit).  You had to reload %fs with KERNEL_DS for the *_user
>> functions to address the kernel segment.
>
> I know.
>
>> v2.1.2 introduced the modern flat memory layout with 4GB segments at
>> base 0.  %fs no longer was used for userspace access, so it wasn't
>> saved in pt_regs or touched in any way until a task switch.  Instead
>> of the hardware enforcing the limit, the check was moved to software.
>
> Yes.
>
>> Originally the signal handler had to set regs->xfs = USER_DS so that
>> the signal handler had a known state when it ran.  That had nothing to
>> do with the kernel's userspace access mechanism.  It was converted to
>> do both the immediate reloading of the %fs register (since it was no
>> longer saved in pt_regs and wouldn't get restored on kernel exit), and
>> to a new set_fs(USER_DS) call which meant something completely
>> different.  That is the origin of the code we are trying to remove
>> now.
>
> That still makes no sense.  2.0 mechanism guaranteed that even if you forgot
> to restore %fs to USER_DS, you wouldn't leak that to userland.  But this
> one didn't - each place like that became a roothole, no matter what you
> did on signal delivery.  Simply because there might have been no unblocked
> signals with userland handlers.  IOW, that set_fs() seems to have been
> useless from the day 1

That's what I was getting at.  The code it replaced did something
totally different (setting up user-mode register state).  The asm()
part of the replacement was the correct thing to do.  The set_fs()
call was unnecessary.

> unless I'm missing something really subtle, like
> e.g. some processes deliberately running (in 2.0) with %fs set to something
> with lower limit, with signal handlers allowed to switch back to normal
> for duration.  And even that would've been broken, since there wouldn't be
> a matching set_fs() in sigreturn()...

If the user process used a different %fs it didn't matter to the
kernel, because %fs was saved to pt_regs and reloaded with USER_DS on
kernel entry.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/