linux-kernel - Re: [PATCH v6] arm64: implement ftrace with regs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82f231a8-c757-da97-bbce-33ac6199a4d9@arm.com>
Date:   Wed, 16 Jan 2019 18:01:01 +0000
From:   Julien Thierry <julien.thierry@....com>
To:     Mark Rutland <mark.rutland@....com>,
        Balbir Singh <bsingharora@...il.com>
Cc:     Torsten Duwe <duwe@....de>, Will Deacon <will.deacon@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        AKASHI Takahiro <takahiro.akashi@...aro.org>,
        Amit Daniel Kachhap <amit.kachhap@....com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        live-patching@...r.kernel.org
Subject: Re: [PATCH v6] arm64: implement ftrace with regs



On 16/01/2019 15:56, Julien Thierry wrote:
> On 14/01/2019 12:26, Mark Rutland wrote:
>> On Mon, Jan 14, 2019 at 11:13:59PM +1100, Balbir Singh wrote:
>>> On Fri, Jan 04, 2019 at 05:50:18PM +0000, Mark Rutland wrote:
>>>> Hi Torsten,
>>>>
>>>> On Fri, Jan 04, 2019 at 03:10:53PM +0100, Torsten Duwe wrote:
>>>>> Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
>>>>> of each function. Replace the first NOP thus generated with a quick LR
>>>>> saver (move it to scratch reg x9), so the 2nd replacement insn, the call
>>>>> to ftrace, does not clobber the value. Ftrace will then generate the
>>>>> standard stack frames.
>>>
>>> Do we know what the overhead would be, if this was a link time change
>>> for the first instruction?
>>
>> No, but it should be possible to benchamrk that for a given workload,
>> which is what I'd like to see.
>>
> 
> So, I hacked up something to have the -fpachable-function-entry=2 in the
> build and then have ftrace_init() patch in the "mov x9, lr" in the first
> nop of the function preludes.
> 
> I tested it on a 8 x Cortex A-57 machine and compared with a version
> that just has the two nops in the function prelude.
> 
> On workloads like hackbench, the average difference is within the noise
> (<1%). Time results below are in seconds.
> 
> 	+------------+--------------------+
> 	| "nop; nop" | "mov x9, lr; nop"  |
> 	+------------+--------------------+
> 	|     43.497 |             42.694 |
> 	|     43.464 |             43.148 |
> 	|     43.599 |             43.131 |
> 	|     43.785 |              43.63 |
> 	|     43.458 |             43.281 |
> 	|       44.3 |             43.328 |
> 	|     43.541 |             43.059 |
> 	|     43.529 |             43.298 |
> 	|      43.58 |             43.937 |
> 	|     43.385 |             43.122 |
> 	|     43.514 |             43.825 |
> 	|     45.508 |             43.268 |
> 	|     43.757 |             43.316 |
> 	|     43.392 |             43.146 |
> 	|     44.029 |             43.236 |
> 	|     43.515 |             43.139 |
> 	|      43.22 |             43.108 |
> 	|     43.496 |             43.836 |
> 	|     43.669 |             43.083 |
> 	|     43.388 |              43.38 |
> 	+------------+--------------------+
> average	|    43.6813 |           43.29825 |
> 	+------------+--------------------+
> 
Here are also some results running hackbench on 4 x Cortex-A53 (pay no
attention to the fact that the timescales are similar, I changed the
number of iteration done by hackbench so it wouldn't take too long)

	+------------+-------------------+
	| "nop; nop" | "mov x9, lr; nop" |
	+------------+-------------------+
	|     43.815 |            44.455 |
	|     43.758 |            45.173 |
	|     44.075 |             43.95 |
	|     44.021 |            44.185 |
	|     43.959 |            44.826 |
	|     44.039 |            44.478 |
	|     43.836 |            44.626 |
	|     44.071 |            45.177 |
	|     43.619 |            45.033 |
	|     44.052 |            45.095 |
	|     43.903 |            44.802 |
	|     43.773 |            44.955 |
	|     43.908 |             45.02 |
	|     43.441 |            44.986 |
	|     44.167 |            45.182 |
	|     44.106 |            45.229 |
	|     43.974 |             45.07 |
	|     43.859 |            45.283 |
	|     43.706 |            44.892 |
	|     43.897 |            44.194 |
	+------------+-------------------+
average |     43.899 |            44.835 |
        +------------+-------------------+


So, in this case the performance take a ~2% hit from keeping the mov
always present in the function prelude instead of a nop.

Makes it a bit less obvious whether the always having that mov there
(whether patched at build time or run time) is good enough.

Cheers,

-- 
Julien Thierry