lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Jan 2019 15:56:24 +0000
From:   Julien Thierry <julien.thierry@....com>
To:     Mark Rutland <mark.rutland@....com>,
        Balbir Singh <bsingharora@...il.com>
Cc:     Torsten Duwe <duwe@....de>, Will Deacon <will.deacon@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        AKASHI Takahiro <takahiro.akashi@...aro.org>,
        Amit Daniel Kachhap <amit.kachhap@....com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        live-patching@...r.kernel.org
Subject: Re: [PATCH v6] arm64: implement ftrace with regs

Hi,

On 14/01/2019 12:26, Mark Rutland wrote:
> On Mon, Jan 14, 2019 at 11:13:59PM +1100, Balbir Singh wrote:
>> On Fri, Jan 04, 2019 at 05:50:18PM +0000, Mark Rutland wrote:
>>> Hi Torsten,
>>>
>>> On Fri, Jan 04, 2019 at 03:10:53PM +0100, Torsten Duwe wrote:
>>>> Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
>>>> of each function. Replace the first NOP thus generated with a quick LR
>>>> saver (move it to scratch reg x9), so the 2nd replacement insn, the call
>>>> to ftrace, does not clobber the value. Ftrace will then generate the
>>>> standard stack frames.
>>
>> Do we know what the overhead would be, if this was a link time change
>> for the first instruction?
> 
> No, but it should be possible to benchamrk that for a given workload,
> which is what I'd like to see.
> 

So, I hacked up something to have the -fpachable-function-entry=2 in the
build and then have ftrace_init() patch in the "mov x9, lr" in the first
nop of the function preludes.

I tested it on a 8 x Cortex A-57 machine and compared with a version
that just has the two nops in the function prelude.

On workloads like hackbench, the average difference is within the noise
(<1%). Time results below are in seconds.

	+------------+--------------------+
	| "nop; nop" | "mov x9, lr; nop"  |
	+------------+--------------------+
	|     43.497 |             42.694 |
	|     43.464 |             43.148 |
	|     43.599 |             43.131 |
	|     43.785 |              43.63 |
	|     43.458 |             43.281 |
	|       44.3 |             43.328 |
	|     43.541 |             43.059 |
	|     43.529 |             43.298 |
	|      43.58 |             43.937 |
	|     43.385 |             43.122 |
	|     43.514 |             43.825 |
	|     45.508 |             43.268 |
	|     43.757 |             43.316 |
	|     43.392 |             43.146 |
	|     44.029 |             43.236 |
	|     43.515 |             43.139 |
	|      43.22 |             43.108 |
	|     43.496 |             43.836 |
	|     43.669 |             43.083 |
	|     43.388 |              43.38 |
	+------------+--------------------+
average	|    43.6813 |           43.29825 |
	+------------+--------------------+


On a kernel build from defconfig, there seems to be around 5%
difference, but funnily enough it's the version with "mov x9, lr" that
seems faster (but maybe that might be caused by delays from the disk or
other IO related stuff).

I'll try a bit more runs of the kernel builds to make sure, but having
"mov x9, lr; nop" does not appear to deteriorate the performance
compared to "nop; nop" as function prelude.

Cheers,

-- 
Julien Thierry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ