lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 15 Jun 2009 15:39:34 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...e.hu>, mingo@...hat.com, hpa@...or.com,
	paulus@...ba.org, acme@...hat.com, linux-kernel@...r.kernel.org,
	a.p.zijlstra@...llo.nl, penberg@...helsinki.fi,
	vegard.nossum@...il.com, efault@....de, jeremy@...p.org,
	npiggin@...e.de, tglx@...utronix.de,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain
	support to use NMI-safe methods

* Linus Torvalds (torvalds@...ux-foundation.org) wrote:
> 
> 
> On Mon, 15 Jun 2009, Ingo Molnar wrote:
> > 
> > The gist of it is the replacement of iret with this open-coded 
> > sequence:
> > 
> > +#define NATIVE_INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
> > +						movq %rsp, %rax;	\
> > +						movq 24+8(%rax), %rsp;	\
> > +						pushq 0+8(%rax);	\
> > +						pushq 16+8(%rax);	\
> > +						movq (%rax), %rax;	\
> > +						popfq;			\
> > +						ret
> 
> That's an odd way of writing it.
> 

There were a few reasons (maybe not all good) for writing it like this :

- Saving I$ (as it is placed close to hot entry.S code paths)
- Staying localized with the top of stack, saving D$ accesses.

But maybe benchmarks will prove my approach overkill, dunno. Also we 
have to be aware that the CPU might behave more slowly in the presence
of unbalanced int/iret, call/ret. I think we should benchmark your 
approach to make sure jmp will not produce such slowdown. But it might
well be faster, and it's definitely clearer.

Thanks,

Mathieu


> Don't we have a per-cpu segment here? I'd much rather just see it do 
> something like this (_before_ restoring the regular registers)
> 
> 	movq EIP(%esp),%rax
> 	movq ESP(%esp),%rdx
> 	movq %rax,gs:saved_esp
> 	movq %rdx,gs:saved_eip
> 
> 	# restore regular regs
> 	RESTORE_ALL
> 
> 	# skip eip/esp to get at eflags
> 	addl $16,%esp
> 	popfq
> 
> 	# restore rsp/rip
> 	movq gs:saved_esp,%rsp
> 	jmpq *(gs:saved_eip)
> 
> but I haven't thought deeply about it. Maybe there's something wrong with 
> the above.
> 
> > If it's faster, this becomes a legit (albeit complex) 
> > micro-optimization in a _very_ hot codepath.
> 
> I don't think it's all that hot. It's not like it's the return to user 
> mode.
> 
> 			Linus

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ