linux-kernel - Re: [GIT PULL] Additional x86 fixes for 2.6.31-rc5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A7499BA.2000405@zytor.com>
Date:	Sat, 01 Aug 2009 12:38:34 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: [GIT PULL] Additional x86 fixes for 2.6.31-rc5

On 08/01/2009 12:28 PM, Linus Torvalds wrote:
> 
> Hmm.
> 
> I just noticed another issue on x86 code generation, since I was looking 
> at assembly language generation due to the do_sigaltstack() kernel stack 
> info leak thing.
> 
> Our "get_current()" seriously sucks now that it's a per-cpu variable.
> 
> Look at the code generated for something like
> 
> 	current->sas_ss_sp = (unsigned long) ss_sp;
> 	current->sas_ss_size = ss_size;
> 
> and notice how the code really really sucks:
> 
>         movq %gs:per_cpu__current_task,%rcx
>         movq    %rdx, 1152(%rcx)
>         movq %gs:per_cpu__current_task,%rdx
>         movq    %rax, 1160(%rdx)
> 
> because it reloads that silly per-cpu variable every time, because the 
> assembler has a constraint of
> 
> 	"m" (per_cpu__current_task)
> 
> and so gcc is worried that the stores will invalidate the result of the 
> load from the per-cpu variable.
> 
> I don't know how to fix that _well_, but here's a not-so-very-pretty patch 
> that seems to shave off 4.5kB from my kernel, and gives gcc much better 
> scheduling for 'current' and 'thread_info' because now it can load them 
> early - and cache them - even in the presense of stores.
> 

This is clearly better... now the semi-obvious question becomes if there
is any way we can get compiler support to do better and migrate to that
as the compiler allows.  In particular, if I remember right the problem
with using __thread for percpu was exactly that the current cpuness can
change almost anywhere, unless preemption is disabled.

I'm wondering if we could use __thread or something like it for the
stable perthreads, perhaps with additional compiler hints.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/