linux-kernel - Re: [PATCH v2 1/3] sched: Stop nohz stats when decayed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <780a5b3a-4829-4195-c8fd-95da27248a82@arm.com>
Date:   Thu, 8 Feb 2018 12:46:53 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, peterz@...radead.org,
        mingo@...nel.org, linux-kernel@...r.kernel.org
Cc:     morten.rasmussen@...s.arm.com, brendan.jackman@....com,
        dietmar.eggemann@....com
Subject: Re: [PATCH v2 1/3] sched: Stop nohz stats when decayed

On 02/06/2018 07:23 PM, Vincent Guittot wrote:
> [...]
> @@ -7826,8 +7842,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
>  		struct rq *rq = cpu_rq(i);
>  
> -		if (env->flags & LBF_NOHZ_STATS)
> -			update_nohz_stats(rq);
> +		if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq))
> +			env->flags |= LBF_NOHZ_AGAIN;
>  
>  		/* Bias balancing toward cpus of our domain */
>  		if (local_group)
> @@ -7979,18 +7995,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>  	struct sg_lb_stats *local = &sds->local_stat;
>  	struct sg_lb_stats tmp_sgs;
>  	int load_idx, prefer_sibling = 0;
> +	int has_blocked = READ_ONCE(nohz.has_blocked);
>  	bool overload = false;
>  
>  	if (child && child->flags & SD_PREFER_SIBLING)
>  		prefer_sibling = 1;
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> -	if (env->idle == CPU_NEWLY_IDLE) {
> +	if (env->idle == CPU_NEWLY_IDLE && has_blocked)
>  		env->flags |= LBF_NOHZ_STATS;
> -
> -		if (cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd)))
> -			nohz.next_stats = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD);
> -	}
>  #endif
>  
>  	load_idx = get_sd_load_idx(env->sd, env->idle);
> @@ -8046,6 +8059,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>  		sg = sg->next;
>  	} while (sg != env->sd->groups);
>  
> +#ifdef CONFIG_NO_HZ_COMMON
> +	if ((env->flags & LBF_NOHZ_AGAIN) &&
> +	    cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
> +
> +		WRITE_ONCE(nohz.next_blocked,
> +				jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));

Here we push the stats update forward if we visited all the nohz CPUs but they
still have blocked load. IMO we should also clear the nohz.has_blocked flag
if we visited all the nohz CPUs and none had blocked load left.

If we don't do that, we could very well have cleared all of the nohz blocked
load in idle_balance and successfully pulled a task, but the flag isn't
cleared so we'll end up doing a _nohz_idle_balance() later on for nothing.

As I said in a previous comment, we also have this problem with periodic
load balance: if a CPU goes nohz (nohz.has_blocked is raised) but wakes up,
e.g. before the next nohz.next_blocked, we should stop kicking ILBs

Now I'd need to test this, but I think it can actually get worse: if that 
CPU keeps generating blocked load after this short idle period, no matter 
how many _nohz_idle_balance() we go through we will never reach a point 
where nohz.has_blocked gets cleared, and we'll keep kicking those ILBs to
update blocked load that already gets updated in the periodic balance.

I think that's where a nohz blocked load cpumask can also help: on top of
skipping nohz CPUs that don't need an update, we can stop the whole remote
update machinery when the last nohz cpu with blocked load wakes up, or say
when it goes through its first periodic balance.

> +	}
> +#endif
> +