lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzk6OOHoQ_e3pPQNUGN80tjXT7vYTNaRjXbWiQUnNRiTA@mail.gmail.com>
Date:	Tue, 17 Jan 2012 22:25:32 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Indan Zupancic <indan@....nu>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Jamie Lokier <jamie@...reable.org>,
	Andrew Lutomirski <luto@....edu>,
	Oleg Nesterov <oleg@...hat.com>,
	Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
	keescook@...omium.org, john.johansen@...onical.com,
	serge.hallyn@...onical.com, coreyb@...ux.vnet.ibm.com,
	pmoore@...hat.com, eparis@...hat.com, djm@...drot.org,
	segoon@...nwall.com, rostedt@...dmis.org, jmorris@...ei.org,
	scarybeasts@...il.com, avi@...hat.com, penberg@...helsinki.fi,
	viro@...iv.linux.org.uk, mingo@...e.hu, akpm@...ux-foundation.org,
	khilman@...com, borislav.petkov@....com, amwang@...hat.com,
	ak@...ux.intel.com, eric.dumazet@...il.com, gregkh@...e.de,
	dhowells@...hat.com, daniel.lezcano@...e.fr,
	linux-fsdevel@...r.kernel.org,
	linux-security-module@...r.kernel.org, olofj@...omium.org,
	mhalcrow@...gle.com, dlaor@...hat.com,
	Roland McGrath <mcgrathr@...omium.org>
Subject: Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:
 [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

On Tue, Jan 17, 2012 at 9:23 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
>  - in that page, do this:
>
>      lea 1f,%edx
>      movl $SYSCALL,%eax
>      movl $-1,4096(%edx)
>  1:
>      int 0x80
>
> and what happens is that the move that *overwrites* the int 0x80 will
> not be noticed by the I$ coherency because it's at another address,
> but by the time you read at $pc-2, you'll get -1, not "int 0x80"

Btw, that's I$ coherency comment is not technically the correct explanation.

The I$ coherency isn't the problem, the problem is that the pipeline
has already fetched the "int 0x80" before the write happens. And the
write - because it's not to the same linear address as the code fetch
- won't trigger the internal "pipeline flush on write to code stream".
So the D$ (and I$) will have the -1 in it, but the instruction fetch
will have walked ahead and seen the "int 80" that existed earlier, and
will execute it.

And the above depends very much on uarch details, so depending on
microarchitecture it may or may not work. But I think the "use a
different virtual address, but same physical address" thing will fake
out all modern x86 cpu's, and your 'ptrace' will see the -1, even
though the system call happened.

Anyway, the *kernel* knows, since the kernel will have seen which
entrypoint it comes through. So we can handle it in the kernel. But
no, you cannot currently securely/reliably use $pc-2 in gdb or ptrace
to determine how the system call was made, afaik.

Of course, limiting things so that you cannot map the same page
executably *and* writably is one solution - and a good idea regardless
- so secure environments can still exist. But even then you could have
races in a multi-threaded environment (they'd just be *much* harder to
trigger for an attacker).

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ