lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Aug 2008 12:04:00 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Steven Rostedt <rostedt@...dmis.org>
cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Roland McGrath <roland@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	Clark Williams <williams@...hat.com>
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon



On Fri, 8 Aug 2008, Steven Rostedt wrote:
> 
> Can a processor be preempted in a middle of nops?

Sure. If you have two nops in a row (and the kernel definition of the NOP 
array does _not_ guarantee that it's a single-instruction one), you may 
get a profile hit (ie any interrupt) on the second one. It's less 
_likely_, but it certainly is not architecturally in any way guaranteed 
that the kernel "nop[]" tables would be atomic.

> What do nops do for a  processor?

Depends on the microarchitecture. But many will squash it in the decode 
stage, and generate no uops for them at all, so it's purely a decode 
throughput issue and has absolutely _zero_ impact for any later CPU 
stages.

> Can it skip them nicely in one shot?

See above. It needs to decode them, and the decoder itself may well have 
some limitations - for example, the longer nops may not even decode in all 
decoders, which is why some uarchs might prefer two short nops to one long 
one, but _generally_ a nop will not make it any further than the decoder. 
But separate nops do count as separate instructions, ie they will hit all 
the normal decode limits (mostly three or four instructions decoded per 
cycle).

> I'm assuming that jmp is more expensive than the nops because otherwise
> a jmp 0 would have been used as a 5 byte nop.

Yes. A CPU core _could_ certainly do special decoding for 'jmp 0' too, but 
I don't know any that do. The 'jmp' is much more likely to be special in 
the front-end and the decoder, and can easily cause things like the 
prefetcher to hickup (ie it tries to start prefetching at the "new" target 
address).

So I would absolutely _expect_ a 'jmp' to be noticeably more expensive 
than one of the special nop's that can be squashed by the decoder. 

A nop that is squashed by the decoder will literally take absolutely no 
other resources. It doesn't even need to be tracked from an instruction 
completion standpoint (which _may_ end up meaning that a profiler would 
actually never see a hit on the second nop, but it's quite likely to 
depend on which decoder it hits etc).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ