linux-kernel - Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150914081534.GA9274@gmail.com>
Date:	Mon, 14 Sep 2015 10:15:34 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Borislav Petkov <bp@...en8.de>,
	"H. Peter Anvin" <hpa@...or.com>, Oleg Nesterov <oleg@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add
 'test_syscall_vdso' test


* Denys Vlasenko <dvlasenk@...hat.com> wrote:

> >> +                       /* INT80 syscall entrypoint can be used by
> >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> >> +                        * Therefore it must preserve R12+
> >> +                        * (they are callee-saved registers in 64-bit C ABI).
> >> +                        *
> >> +                        * This was probably historically not intended,
> >> +                        * but R8..11 are clobbered (cleared to 0).
> >> +                        * IOW: they are the only registers which aren't
> >> +                        * preserved across INT80 syscall.
> >> +                        */
> >> +                       if (*r64 == 0 && num <= 11)
> >> +                               continue;
> > 
> > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > preserve all of the 64-bit regs.
> 
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.

Argh, that's dangerous nonsense! You _still_ don't seem to understand what the 
Linux ABI means and how to change code that implements it...

Let me try one more time, and please make the mental effort to understand and 
internalize it, because I'm not going to apply any more ABI affecting patches from 
you if you don't...

When you read 'ABI' you always need to write it out mentally:

    _Application_ Binary Interface

Do you see the word 'kernel developer' in that phrase? I don't, and this really 
matters: what an ABI is does not depend on what the kernel developers think or 
thought it means. Our state of mind, our design preferences, our documentation and 
our intentions are absolutely irrelevant.

What an ABI is purely depends on is whether applications break or change behavior 
as a result of us changing it.

What an ABI depends on is how applications (pretty much any real application or 
library, no matter how rare) learned to use the kernel's previous implementation 
of the ABI, in released kernels - in every tiny detail of that ABI.

If _any_ application/library functionally relied on saving registers in this place 
- even if that behavior was 'accidental' from the kernel developer's POV, and if 
the app would break or misbehave if we changed that detail, then that's an 
'accidental ABI', which is an ABI just as much!!!

This is why we preserve old quirks and this is why we try to be 110% careful 
around ABIs, especially low level ones that expose a big set of hardware registers 
that applications could rely on in unexpected ways.

Not because we couldn't "fix" them, but because an ABI is a long term contract: we 
promise absolutely no breakage to applications, no matter what.

Mistaken ABIs, unintended ABIs, deterministically quirky ABIs that apps/libs 
learned to rely on, even if the ABI usage in the app itself is just a bug or is 
otherwise unintended, has _exactly_ the same ABI strength as our sys_close() ABI.

There are some very, very rare expections, but none of them apply here:

 - If an ABI component behaved non-deterministically (relying on random stack
   content for example) then we cannot always fully preserve it when fixing it - 
   but we'll generally gravitate towards trying to fix it while preserving 
   existing apps.

 - Sometimes, 1 out of 100,000 kernel bugs, preserving an old, rare ABI would mean
   introducing or allowing a serious security hole. There's been less than 3 such 
   cases in all of the 500,000+ commits in the Linux kernel that I'm aware of, so 
   this is not a case you need to even know about.

 - Testcases whose only purpose is to bake in every single silly detail from an
   exposed ABI generally don't count. Only if a real app/lib that people use is
   affected counts. (There are exceptions even to this exception: if the ABI
   change that breaks a test-case is obviously gratuitous then the ABI counts: 
   because reasonably there _might_ be apps out there that rely on the reasonable 
   old behavior, they just haven't been reported yet.) Obviously an ABI test-case 
   failing is a red flag, and by default we should assume we messed up and need to 
   unbreak the ABI - unless we are absolutely convinced it's OK. And even if we 
   think it's absolutely OK it takes just a single regression report of an 
   application breaking to revert an ABI change - no questions asked.

> (I'm not very comfortable about additional six push/pops which are necessary for 
> this to happen. I'm surprised maintainers tentatively agreed to that - I was 
> grilled and asked to prove with measurements that *one* additional push+pop 
> wasn't adding significant overhead).

Yeah, so here's our thinking: the code is still not as maintainable as we wish it 
to be, so we are willing to temporarily accept those pop/pushes to help along the 
C conversion effort, to make the code (much!) more maintainable.

Other changes still have to be justified on an instruction granular manner.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/