linux-kernel - Re: [PATCH V2 9/9] jump_label: Batch up if arch supports it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8a417e9-eb0f-d087-f0c5-7405b030e51c@redhat.com>
Date:   Tue, 18 Dec 2018 20:33:48 +0100
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Zhou Chengming <zhouchengming1@...wei.com>,
        Jiri Kosina <jkosina@...e.cz>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Chris von Recklinghausen <crecklin@...hat.com>,
        Jason Baron <jbaron@...mai.com>, Scott Wood <swood@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Clark Williams <williams@...hat.com>, x86@...nel.org
Subject: Re: [PATCH V2 9/9] jump_label: Batch up if arch supports it

On 12/18/18 6:35 PM, Steven Rostedt wrote:
> On Tue, 18 Dec 2018 17:46:38 +0100
> Daniel Bristot de Oliveira <bristot@...hat.com> wrote:
> 
> 
> I'd say add this file first, before x86 supports it. That way it's easy
> for you to test if this file is correct for other archs that do not
> support it.
> 
> When x86 supports it, the "on switch" for that should be the added
> config, just like what other architectures will do.

right!

>> If the architecture supports the batching of jump label updates, use it!
>>
>> An easy way to see the benefits of this patch is switching the
>> schedstats on and off. For instance:
>>
>> -------------------------- %< ---------------------------- 
>> #!/bin/bash
>>
>> while [ true ]; do 
>>     sysctl -w kernel.sched_schedstats=1
>>     sleep 2
>>     sysctl -w kernel.sched_schedstats=0
>>     sleep 2
>> done
>> -------------------------- >% ----------------------------  
>>
>> while watching the IPI count:
>>
>> -------------------------- %< ---------------------------- 
>> # watch -n1 "cat /proc/interrupts | grep Function"
>> -------------------------- >% ----------------------------  
>>
>> With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
>> while with this patch the number of IPIs goes to 3 each 2 seconds.
>>
>> Regarding the performance impact of this patch set, I made two measurements:
>>
>>     The time to update a key (the task that is causing the change)
>>     The time to run the int3 handler (the side effect on a thread that
>>                                       hits the code being changed)
>>
>> The schedstats static key was chosen as the key to being switched on and off.
>> The reason being is that it is used in more than 56 places, in a hot path. The
>> change in the schedstats static key will be done with the following command:
>>
>> while [ true ]; do
>>     sysctl -w kernel.sched_schedstats=1
>>     usleep 500000
>>     sysctl -w kernel.sched_schedstats=0
>>     usleep 500000
>> done
>>
>> In this way, they key will be updated twice per second. To force the hit of the
>> int3 handler, the system will also run a kernel compilation with two jobs per
>> CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
>> @2.27GHz.
>>
>> Regarding the update part, on average, the regular kernel takes 57 ms to update
>> the schedstats key, while the kernel with the batch updates takes just 1.4 ms
>> on average. Although it seems to be too good to be true, it makes sense: the
>> schedstats key is used in 56 places, so it was expected that it would take
>> around 56 times to update the keys with the current implementation, as the
>> IPIs are the most expensive part of the update.
>>
>> Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
>> the batch version takes around 180 ns. At first glance, it seems to be a high
>> value. But it is not, considering that it is doing 56 updates, rather than one!
>> It is taking four times more, only. This gain is possible because the patch
>> uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
>> an overhead within four times.
>>
>> (voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
>> a shorter period (because the update part is on for a shorter time), the number
>> of hits in the int3 handler decreased by 10%.
>>
>> The question then is: Is it worth paying the price of "135 ns" more in the int3
>> handler?
>>
>> Considering that, in this test case, we are saving the handling of 53 IPIs,
>> that takes more than these 135 ns, it seems to be a meager price to be paid.
>> Moreover, the test case was forcing the hit of the int3, in practice, it
>> does not take that often. While the IPI takes place on all CPUs, hitting
>> the int3 handler or not!
>>
>> For instance, in an isolated CPU with a process running in user-space
>> (nohz_full use-case), the chances of hitting the int3 handler is barely zero,
>> while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
>> a lot this scenario.
>>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@...hat.com>
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Ingo Molnar <mingo@...hat.com>
>> Cc: Borislav Petkov <bp@...en8.de>
>> Cc: "H. Peter Anvin" <hpa@...or.com>
>> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>> Cc: Pavel Tatashin <pasha.tatashin@...cle.com>
>> Cc: Masami Hiramatsu <mhiramat@...nel.org>
>> Cc: "Steven Rostedt (VMware)" <rostedt@...dmis.org>
>> Cc: Zhou Chengming <zhouchengming1@...wei.com>
>> Cc: Jiri Kosina <jkosina@...e.cz>
>> Cc: Josh Poimboeuf <jpoimboe@...hat.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@...radead.org>
>> Cc: Chris von Recklinghausen <crecklin@...hat.com>
>> Cc: Jason Baron <jbaron@...mai.com>
>> Cc: Scott Wood <swood@...hat.com>
>> Cc: Marcelo Tosatti <mtosatti@...hat.com>
>> Cc: Clark Williams <williams@...hat.com>
>> Cc: x86@...nel.org
>> Cc: linux-kernel@...r.kernel.org
>> ---
>>  include/linux/jump_label.h |  3 +++
>>  kernel/jump_label.c        | 29 +++++++++++++++++++++++++++++
>>  2 files changed, 32 insertions(+)
>>
>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>> index c88d903befc0..ed51ef3e1abd 100644
>> --- a/include/linux/jump_label.h
>> +++ b/include/linux/jump_label.h
>> @@ -219,6 +219,9 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
>>  				      enum jump_label_type type);
>>  extern void arch_jump_label_transform_static(struct jump_entry *entry,
>>  					     enum jump_label_type type);
>> +extern int arch_jump_label_transform_queue(struct jump_entry *entry,
>> +					    enum jump_label_type type);
>> +extern void arch_jump_label_transform_apply(void);
>>  extern int jump_label_text_reserved(void *start, void *end);
>>  extern void static_key_slow_inc(struct static_key *key);
>>  extern void static_key_slow_dec(struct static_key *key);
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index 22093b5f57c9..08ace142af0a 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -409,6 +409,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
>>  	return 0;
>>  }
>>  
>> +#ifndef HAVE_JUMP_LABEL_BATCH
>>  static void __jump_label_update(struct static_key *key,
>>  				struct jump_entry *entry,
>>  				struct jump_entry *stop,
>> @@ -421,6 +422,34 @@ static void __jump_label_update(struct static_key *key,
>>  		}
>>  	}
>>  }
>> +#else
>> +static void __jump_label_update(struct static_key *key,
>> +				struct jump_entry *entry,
>> +				struct jump_entry *stop,
>> +				bool init)
>> +{
>> +	for_each_label_entry(key, entry, stop) {
>> +
>> +		if (!jump_label_can_update_check(entry, init))
>> +			continue;
>> +
>> +		if (arch_jump_label_transform_queue(entry,
>> +						    jump_label_type(entry)))
>> +			continue;
>> +
>> +		/*
>> +		 * Queue's overflow: Apply the current queue, and then
>> +		 * queue again. If it stills not possible to queue, BUG!
>> +		 */
>> +		arch_jump_label_transform_apply();
>> +		if (!arch_jump_label_transform_queue(entry,
>> +						     jump_label_type(entry))) {
>> +			BUG();
> 
> Why BUG()? Do you really want to crash Linus's machine?

I am using BUG() because that is what I see in other part of jump_label code:
	If something goes wrong:
		BUG().

What I could do here is:

Add a "fallback" boll that is disabled by default.
If I hit this case:
	WARN()
	turn "fallback" on, returning to the old mode (without batch)

Sound better?

Thanks!
-- Daniel

> -- Steve
> 
>> +		}
>> +	}
>> +	arch_jump_label_transform_apply();
>> +}
>> +#endif
>>  
>>  void __init jump_label_init(void)
>>  {
>