lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091108113546.GN11372@elte.hu>
Date:	Sun, 8 Nov 2009 12:35:46 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Stefani Seibold <stefani@...bold.net>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Americo Wang <xiyou.wangcong@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH] RFC x86_64 more accurate KSTK_ESP implementation


* Stefani Seibold <stefani@...bold.net> wrote:

> Hi,
> 
> this is a RFC for a more accurate KSTK_ESP implementation for the x86_64
> architecture.
> 
> Because the usersp will be only updated by a context switch this value
> is most of the time outdated. This patch update the per CPU variable
> old_rsp in the device and timer interrupt too.  
> 
> In my opinion this can be save done if the current stack pointer is
> outside the kernel stack of the current task and the instruction pointer
> is not inside the kernel.
> 
> The old_rsp value will be stored in usersp in case of a context switch.
> 
> The KSTK_ESP will get the value from old_rsp in case the task is the
> current task, otherwise it will read usersp.
> 
> I know about the performance coast, so this is why i ask for comments.
> 
> Stefani
> 
> Signed-off-by: Stefani Seibold <stefani@...bold.net>
> 
>  include/asm/processor.h |    4 +++-
>  kernel/apic/apic.c      |    3 +++
>  kernel/irq_64.c         |    1 +
>  kernel/process_64.c     |   20 ++++++++++++++++++++
>  4 files changed, 27 insertions(+), 1 deletion(-)
> 
> --- linux-2.6.32-rc5.old/arch/x86/include/asm/processor.h	2009-10-16 02:41:50.000000000 +0200
> +++ linux-2.6.32-rc5.new/arch/x86/include/asm/processor.h	2009-11-05 08:28:23.765300812 +0100
> @@ -1000,7 +1000,7 @@
>  #define thread_saved_pc(t)	(*(unsigned long *)((t)->thread.sp - 8))
>  
>  #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
> -#define KSTK_ESP(tsk)		-1 /* sorry. doesn't work for syscall. */
> +extern unsigned long KSTK_ESP(struct task_struct *task);
>  #endif /* CONFIG_X86_64 */
>  
>  extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
> @@ -1052,4 +1052,6 @@
>  	return ratio;
>  }
>  
> +extern void update_usersp(struct pt_regs *regs);
> +
>  #endif /* _ASM_X86_PROCESSOR_H */
> --- linux-2.6.32-rc5.old/arch/x86/kernel/process_64.c	2009-10-16 02:41:50.000000000 +0200
> +++ linux-2.6.32-rc5.new/arch/x86/kernel/process_64.c	2009-11-05 08:52:39.965227285 +0100
> @@ -664,3 +664,23 @@
>  	return do_arch_prctl(current, code, addr);
>  }
>  
> +void update_usersp(struct pt_regs *regs)
> +{
> +	unsigned long stk = (unsigned long)task_stack_page(current);
> +	unsigned long stkp = (regs)->sp; 

Cleanliness: no need for that parenthesis.

> +
> +	if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
> +	    && regs->ip < PAGE_OFFSET)
> +		percpu_write(old_rsp, stkp);
> +}

that check for regs->ip looks imprecise - why dont you use the 
user_mode_vm()?

It's true that the value itself is statistical, but still we dont want 
to leak a kernel-space regs->sp reason - it's an information leak.

> +
> +unsigned long KSTK_ESP(struct task_struct *task)
> +{
> +	if (test_tsk_thread_flag(task, TIF_IA32)) 
> +		return task_pt_regs(task)->sp;
> +
> +	if (task != current)
> +		return task->thread.usersp;
> +
> +	return percpu_read(old_rsp);
> +}
> --- linux-2.6.32-rc5.old/arch/x86/kernel/irq_64.c	2009-10-16 02:41:50.000000000 +0200
> +++ linux-2.6.32-rc5.new/arch/x86/kernel/irq_64.c	2009-11-04 22:29:55.762951577 +0100
> @@ -53,6 +53,7 @@
>  	struct irq_desc *desc;
>  
>  	stack_overflow_check(regs);
> +	update_usersp(regs);
>
>  
>  	desc = irq_to_desc(irq);
>  	if (unlikely(!desc))
> --- linux-2.6.32-rc5.old/arch/x86/kernel/apic/apic.c	2009-10-16 02:41:50.000000000 +0200
> +++ linux-2.6.32-rc5.new/arch/x86/kernel/apic/apic.c	2009-11-04 23:12:32.805086991 +0100
> @@ -831,6 +831,9 @@
>  {
>  	struct pt_regs *old_regs = set_irq_regs(regs);
>  
> +#ifndef CONFIG_X86_32
> +	update_usersp(regs);
> +#endif

Cleanliness: please eliminate this #ifdef by defining update_usersp() on 
32-bit as well, as an empty inline function.

But, i dont like this patch because it adds overhead to the IRQ 
fastpath.

I'd suggest a competely different method: why dont you use an IPI to 
sample the SP whenever someone wants to read it from /proc and we see 
that the task is running on a CPU right now?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ