lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 4 Mar 2015 10:00:38 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	X86 ML <x86@...nel.org>, Andy Lutomirski <luto@...capital.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2 07/15] x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2

On Wed, Mar 04, 2015 at 07:25:52AM +0100, Ingo Molnar wrote:
> Btw., as a future optimization, wouldn't it be useful to patch this 
> function at its first instruction, i.e. to have three fully functional 
> copy_user_generic_ variants and choose to jmp to one of them in the 
> first instruction of the original function?

Yeah, we already have callsites doing that:

old insn VA: 0xffffffff8100e8b3, CPU feat: X86_FEATURE_REP_GOOD, size: 5
xfpregs_get:
 ffffffff8100e8b3:      e8 f8 40 2b 00          callq ffffffff812c29b0
repl insn: 0xffffffff81e1c255, size: 5
 ffffffff81e1c255:      e8 16 68 4a ff          callq ffffffff812c2a70
ffffffff8100e8b3:       e8 16 68 4a ff          callq ffffffff804b50ce

old insn VA: 0xffffffff8100e8b3, CPU feat: X86_FEATURE_ERMS, size: 5
xfpregs_get:
 ffffffff8100e8b3:      e8 f8 40 2b 00          callq ffffffff812c29b0
repl insn: 0xffffffff81e1c25a, size: 5
 ffffffff81e1c25a:      e8 51 68 4a ff          callq ffffffff812c2ab0
ffffffff8100e8b3:       e8 51 68 4a ff          callq ffffffff804b5109

That's copy_user_generic() which does the alternative_call_2().

And yes, it is on the TODO list to optimize those other alternatives
sites.

> The advantage would be two-fold:
> 
>  1) right now: smart microarchitectures that are able to optimize
>     jump-after-jump (and jump-after-call) targets in their branch
>     target cache can do so in this case, reducing the overhead of the
>     patching, possibly close to zero in the cached case.

Yeah, it would still be better to simply do CALL and no unconditional
JMPs in there later. But this is future work, one thing at a time :)

>  2) in the future: we could actually do a (limited) re-link of the
>     kernel during bootup, and patch up the original copy_to_user call
>     sites directly to one of the three variants. Alternatives patching
>     done at the symbol level. Does current tooling allow something
>     like this already?

Well, I have a patchset which uses relocs to patch vmlinux at build
time. And that was the initial approach to this but you cannot know
which features a CPU supports until boot time so you have to boot.

BUT(!), you can replace stuff like X86_FEATURE_ALWAYS at build time
already (this is the static_cpu_has_safe() stuff). I'll look into that
later and dust off my relocs pile.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ