lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080428221122.GC16153@elte.hu>
Date:	Tue, 29 Apr 2008 00:11:22 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 0/2] Immediate Values - jump patching update


* H. Peter Anvin <hpa@...or.com> wrote:

>>> I still think this is the completely wrong approach.
>>
>> hm, can it result in a broken kernel? If yes, how? Or are your 
>> objections more higher level?
>
> My objections are higher level, I believe the current code is (a) 
> painfully complex, and I'd rather not see it in the kernel, and (b) 
> the wrong thing anyway.
>
> Put a 5-byte nop in as the marker, and patch it with a call 
> instruction, out of line, to a collector function.

the counter argument was that by specific sched.o analysis, this results 
in slower code. The reason is that the "function call parameter 
preparation" halo around that 5-byte patch site is larger than that 
single conditional branch operation to an offline place of the current 
function is.

i.e. the current optimized marker approach does roughly this:

  [ .... fastpath head ....       ]
  [ immediate value instruction   ]  --->
  [ branch instruction            ]  ---> these two get NOP-ed out
  [ .... fastpath tail ....       ]
  [ ............................. ]
  [ ... offline area ............ ]
  [ ... parameter preparation ... ]
  [ ... marker call ............. ]

your proposed 5-byte call NOP approach (which btw. was what i proposed 
multiple times in the past 2 years) would do this:

  [ .... fastpath head ......     ]
  [ ... parameter preparation ... ]
  [ ....   5-byte CALL .......... ]  ---> NOP-ed out
  [ .... fastpath tail .......... ]
  [ ............................. ]

in the first case we have little "marker parameter/value preparation" 
cost: it all happens in the 'offline area' _by GCC_. I.e. the fastpath 
is relatively undisturbed.

in the latter case, all the 'parameter preparation' phase has to happen 
at around the 5-byte CALL site, in the fastpath. This, in the specific, 
assembly level analysis of sched.o, was shown by Matthieu to be a 
pessimisation. We are better off by inserting that conditional and 
letting gcc generate the call, than by forcing it in the middle of the 
fastpath - even if we end up NOP-ing out the call.

wrt. complexity i agree with you - if the current optimization cannot be 
made correctly we have to fall back to a simpler variant, even if it's 
slower.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ