lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160111135005.49f38a45@icelake>
Date:	Mon, 11 Jan 2016 13:50:05 -0800
From:	Jacob Pan <jacob.jun.pan@...ux.intel.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Ingo Molnar <mingo@...hat.com>,
	John Stultz <john.stultz@...aro.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	Len Brown <len.brown@...el.com>,
	Rafael Wysocki <rafael.j.wysocki@...el.com>,
	Eduardo Valentin <edubezval@...il.com>,
	Paul Turner <pjt@...gle.com>, jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH 3/4] sched: introduce synchronized idle injection

On Fri, 20 Nov 2015 19:53:45 +0100 (CET)
Thomas Gleixner <tglx@...utronix.de> wrote:

> > This all makes the idle-injection muck hard depend on hres_active,
> > but I think that's a sane constraint anyway.  
> 
> It makes it actually depend on NO_HZ || HRES. It comes with a few
> other restrictions as well: I'm not going to support that for TSCs
> which stop in C3. It's ugly enough already and I really don't want to
> muck with the broadcast device.
> 
> One other thing is that the caller has to ensure that the unblock()
> happens in a timely manner. I really would recommend to just disable
> preemption around that machine on fire wait loop and screw the RT
> tasks. We screw them anyway when they depend on a timer.
> 
> Untested patch below.
> 
Sorry it took a while, but I tested the patch below with some small
tweaks. Here are the testing results and issues.

1. Block tick works well to improve idle efficiency, e.g. on an Ivy
Bridge 4 core 8 thread desktop system. Performance per watt is
better than without blocking the ticks. It is also more
consistent and predictable. For CPU bound floating point spin workload,
compare idle percentage from 0-50%. The test below is limited to
package C7. The improvement is likely better on newer CPUs or system
tuned with deeper package C-states.

W/O patch
==========
IdlePct    Perf    RAPL    WallPower PPW_RAPL
0 467.16 29.58 0.0 15.79
5 441.68 27.94 0.0 15.81
10 406.71 26.50 0.0 15.35
15 372.16 24.94 0.0 14.92
20 340.89 23.41 0.0 14.56
25 307.87 21.97 0.0 14.01
30 310.69 21.39 0.0 14.53
35 293.29 21.00 0.0 13.97
40 268.78 19.90 0.0 13.51
45 240.66 18.73 0.0 12.85
50 238.56 18.74 0.0 12.73

W/ patch
IdlePct    Perf    RAPL    WallPower PPW_RAPL
0 467.19 29.71 0.0 15.73
5 445.07 28.53 0.0 15.6
10 420.34 27.43 0.0 15.32
15 398.47 26.25 0.0 15.18
20 376.47 25.05 0.0 15.03
25 354.13 23.81 0.0 14.87
30 329.50 22.57 0.0 14.6
35 304.38 21.27 0.0 14.31
40 290.42 20.21 0.0 14.37
45 254.82 16.70 0.0 15.26
50 236.10 17.75 0.0 13.3

Note: PPW_RAPL= performance per watt measured by RAPL package power
meter.

If we just look at 20% forced idle, I can see significantly more deep
package C-states(PC7) residency.
		Pkg%pc2 	Pkg%pc6 	Pkg%pc7 
w/o patch	3.17		4.39   		13.21 
w/ patch	1.15   		1.43            17.24 

2. run into warning
[  109.035699] WARNING: CPU: 0 PID: 2748 at
kernel/time/clockevents.c:325 clockevents_program_event+0x102/0x110() 
[  109.035700] Current state: 0
I think it is caused by tick_throttler in CLOCK_EVT_STATE_DETACHED
state. perhaps fix it with below?
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1141,6 +1141,7 @@ static struct clock_event_device tick_throttler =
{
        .features       = CLOCK_EVT_FEAT_ONESHOT,
        .event_handler  = tick_throttling_handler,
        .set_next_event = tick_throttling_noop,
+       .state_use_accessors = CLOCK_EVT_STATE_ONESHOT,
        .mult           = 1,
        .shift          = 0,
        .irq            = -1,

3. typo? since we use ns as delay unit
@@ -1175,7 +1176,7 @@ int tick_nohz_block_tick(u64 delay)
        if (WARN_ON_ONCE(td->evtdev == &tick_throttler))
                goto out;
 
-       if (delay > td->evtdev->max_delta_ticks) {
+       if (delay > td->evtdev->max_delta_ns) 


4. I see random message about "sched: RT throttling activated", not
sure accounting is being affected. I never got close to 95% RT.

5. Random soft lockup, stall, detected by NMI watchdog, or RCU preempt,
still debugging but each idle duration is only 5 ticks. I also disabled
preemption as suggested.

Thanks,

Jacob

> Thanks,
> 
> 	tglx
> 
> 8<----------------
> Subject: tick/nohz: Allow blocking the tick to prevent fire
> From: Thomas Gleixner <tglx@...utronix.de>
> Date: Fri, 20 Nov 2015 14:28:17 +0100
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>  include/linux/tick.h     |    5 ++
>  kernel/time/Kconfig      |    5 ++
>  kernel/time/tick-sched.c |   99
> +++++++++++++++++++++++++++++++++++++++++++++++
> kernel/time/tick-sched.h |    7 ++- 4 files changed, 114
> insertions(+), 2 deletions(-)
> 
> Index: tip/include/linux/tick.h
> ===================================================================
> --- tip.orig/include/linux/tick.h
> +++ tip/include/linux/tick.h
> @@ -203,4 +203,9 @@ static inline void tick_nohz_task_switch
>  		__tick_nohz_task_switch();
>  }
>  
> +#ifdef CONFIG_TICK_THROTTLING
> +int tick_nohz_block_tick(u64 delay);
> +void tick_nohz_unblock_tick(void);
> +#endif
> +
>  #endif
> Index: tip/kernel/time/Kconfig
> ===================================================================
> --- tip.orig/kernel/time/Kconfig
> +++ tip/kernel/time/Kconfig
> @@ -60,6 +60,11 @@ menu "Timers subsystem"
>  config TICK_ONESHOT
>  	bool
>  
> +# Special functions for tick throttling to avoid fire extinguishers
> +config TICK_THROTTLING
> +       bool
> +       depends on TICK_ONESHOT
> +
>  config NO_HZ_COMMON
>  	bool
>  	depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
> Index: tip/kernel/time/tick-sched.c
> ===================================================================
> --- tip.orig/kernel/time/tick-sched.c
> +++ tip/kernel/time/tick-sched.c
> @@ -1119,6 +1119,105 @@ void tick_setup_sched_timer(void)
>  }
>  #endif /* HIGH_RES_TIMERS */
>  
> +#ifdef CONFIG_TICK_THROTTLING
> +/*
> + * An interrupt might have been pending already, when we programmed
> + * the throttler time. Nothing to do here. The device is armed and
> the
> + * interrupt will fire again. If this is the real wakeup event then
> + * the unblock call will retrigger it.
> + */
> +static void tick_throttling_handler(struct clock_event_device *dev)
> +{
> +}
> +
> +static int tick_throttling_noop(unsigned long evt,
> +				struct clock_event_device *d)
> +{
> +	return 0;
> +}
> +
> +static struct clock_event_device tick_throttler = {
> +	.name		= "throttler",
> +	.features	= CLOCK_EVT_FEAT_ONESHOT,
> +	.event_handler	= tick_throttling_handler,
> +	.set_next_event	= tick_throttling_noop,
> +	.mult		= 1,
> +	.shift		= 0,
> +	.irq		= -1,
> +};
> +
> +/**
> + * tick_nohz_block_tick - Force block the tick to prevent fire
> + * @delay:	Defer the tick for X nano seconds
> + *
> + * This is a special interface for thermal emergencies. Do not use
> + * otherwise!  Make sure to call tick_nohz_block_tick() right after
> + * the delay ends to undo the damage.
> + */
> +int tick_nohz_block_tick(u64 delay)
> +{
> +	struct tick_device *td;
> +	unsigned long flags;
> +	int ret = -EBUSY;
> +	ktime_t until;
> +
> +	if (!tick_nohz_active)
> +		return -EBUSY;
> +
> +	local_irq_save(flags);
> +	td = this_cpu_ptr(&tick_cpu_device);
> +
> +	/* No way to do that with broadcast */
> +	if (td->evtdev->features & CLOCK_EVT_FEAT_C3STOP)
> +		goto out;
> +
> +	/* Yes, I do not trust people! */
> +	if (WARN_ON_ONCE(td->evtdev == &tick_throttler))
> +		goto out;
> +
> +	if (delay > td->evtdev->max_delta_ticks) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	until = ktime_add_ns(ktime_get(), delay);
> +	if (!tick_program_event(until, 0)) {
> +		td->real_evtdev = td->evtdev;
> +		td->evtdev = &tick_throttler;
> +		ret = 0;
> +	}
> +out:
> +	local_irq_restore(flags);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tick_nohz_block_tick);
> +
> +/**
> + * tick_nohz_force_unblock_tick - Undo the force blocking of the tick
> + *
> + * Pairs with tick_nohz_block_tick(). Can be called unconditionally
> + * even if the tick was not blocked by tick_nohz_block_tick().
> + */
> +void tick_nohz_unblock_tick(void)
> +{
> +	struct tick_device *td;
> +	unsigned long flags;
> +
> +	if (!tick_nohz_active)
> +		return;
> +
> +	local_irq_save(flags);
> +	td = this_cpu_ptr(&tick_cpu_device);
> +	if (td->evtdev == &tick_throttler) {
> +		td->evtdev = td->real_evtdev;
> +		/* Force a timer interrupt now */
> +		tick_program_event(ktime_get(), 1);
> +	}
> +	local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(tick_nohz_unblock_tick);
> +#endif /* TICK_THROTTLING */
> +
>  #if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
>  void tick_cancel_sched_timer(int cpu)
>  {
> Index: tip/kernel/time/tick-sched.h
> ===================================================================
> --- tip.orig/kernel/time/tick-sched.h
> +++ tip/kernel/time/tick-sched.h
> @@ -9,8 +9,11 @@ enum tick_device_mode {
>  };
>  
>  struct tick_device {
> -	struct clock_event_device *evtdev;
> -	enum tick_device_mode mode;
> +	struct clock_event_device	*evtdev;
> +	enum tick_device_mode		mode;
> +#ifdef CONFIG_TICK_THROTTLING
> +	struct clock_event_device	*real_evtdev;
> +#endif
>  };
>  
>  enum tick_nohz_mode {

[Jacob Pan]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ