linux-kernel - Re: [PATCH 0/5] ftrace: to kill a daemon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 8 Aug 2008 12:04:00 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Steven Rostedt <rostedt@...dmis.org>
cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Roland McGrath <roland@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	Clark Williams <williams@...hat.com>
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon

On Fri, 8 Aug 2008, Steven Rostedt wrote:
> 
> Can a processor be preempted in a middle of nops?

Sure. If you have two nops in a row (and the kernel definition of the NOP 
array does _not_ guarantee that it's a single-instruction one), you may 
get a profile hit (ie any interrupt) on the second one. It's less 
_likely_, but it certainly is not architecturally in any way guaranteed 
that the kernel "nop[]" tables would be atomic.

> What do nops do for a  processor?

Depends on the microarchitecture. But many will squash it in the decode 
stage, and generate no uops for them at all, so it's purely a decode 
throughput issue and has absolutely _zero_ impact for any later CPU 
stages.

> Can it skip them nicely in one shot?

See above. It needs to decode them, and the decoder itself may well have 
some limitations - for example, the longer nops may not even decode in all 
decoders, which is why some uarchs might prefer two short nops to one long 
one, but _generally_ a nop will not make it any further than the decoder. 
But separate nops do count as separate instructions, ie they will hit all 
the normal decode limits (mostly three or four instructions decoded per 
cycle).

> I'm assuming that jmp is more expensive than the nops because otherwise
> a jmp 0 would have been used as a 5 byte nop.

Yes. A CPU core _could_ certainly do special decoding for 'jmp 0' too, but 
I don't know any that do. The 'jmp' is much more likely to be special in 
the front-end and the decoder, and can easily cause things like the 
prefetcher to hickup (ie it tries to start prefetching at the "new" target 
address).

So I would absolutely _expect_ a 'jmp' to be noticeably more expensive 
than one of the special nop's that can be squashed by the decoder. 

A nop that is squashed by the decoder will literally take absolutely no 
other resources. It doesn't even need to be tracked from an instruction 
completion standpoint (which _may_ end up meaning that a profiler would 
actually never see a hit on the second nop, but it's quite likely to 
depend on which decoder it hits etc).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/