lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180426144137.GC3315@bigcity.dyn.berto.se>
Date:   Thu, 26 Apr 2018 16:41:38 +0200
From:   Niklas Söderlund 
        <niklas.soderlund@...natech.se>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Heiner Kallweit <hkallweit1@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-renesas-soc@...r.kernel.org
Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update
 blocked load when newly idle")

Hi Vincent,

Thanks for all your help.

On 2018-04-26 12:31:33 +0200, Vincent Guittot wrote:
> Hi Niklas,
> 
> Le Thursday 26 Apr 2018 à 00:56:03 (+0200), Niklas Söderlund a écrit :
> > Hi Vincent,
> > 
> > Here are the result, sorry for the delay.
> > 
> > On 2018-04-23 11:54:20 +0200, Vincent Guittot wrote:
> > 
> > [snip]
> > 
> > > 
> > > Thanks for the report. Can you re run with the following trace-cmd sequence ? My previous sequence disables ftrace events
> > > 
> > > trace-cmd reset > /dev/null
> > > trace-cmd start -b 40000 -p function -l dump_backtrace:traceoff -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk
> > > trace-cmd start -b 40000 -p function -l dump_backtrace -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk
> > > 
> > > I have updated the patch and added traces to check that scheduler returns from idle_balance function and doesn't stay stuck
> > 
> > Once more I applied the change bellow on-top of c18bb396d3d261eb ("Merge 
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net").
> > 
> > This time the result of 'trace-cmd report' is so large I do not include 
> > it here, but I attach the trace.dat file. Not sure why but the timing of 
> > sending the NMI to the backtrace print is different (but content the 
> > same AFIK) so in the odd change it can help figure this out:
> > 
> 
> Thanks for the trace, I have been able to catch a problem with it.
> Could you test the patch below to confirm that the problem is solved ?
> The patch apply on-top of
> c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")

I can confirm that with the patch bellow I can no longer produce the 
problem. Thanks!

> 
> From: Vincent Guittot <vincent.guittot@...aro.org>
> Date: Thu, 26 Apr 2018 12:19:32 +0200
> Subject: [PATCH] sched/fair: fix the update of blocked load when newly idle
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> With commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle"),
> we release the rq->lock when updating blocked load of idle CPUs. This open
> a time window during which another CPU can add a task to this CPU's cfs_rq.
> The check for newly added task of idle_balance() is not in the common path.
> Move the out label to include this check.
> 
> Fixes: 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
> Reported-by: Heiner Kallweit <hkallweit1@...il.com>
> Reported-by: Niklas Söderlund <niklas.soderlund@...natech.se>
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0951d1c..15a9f5e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9847,6 +9847,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
>  	if (curr_cost > this_rq->max_idle_balance_cost)
>  		this_rq->max_idle_balance_cost = curr_cost;
>  
> +out:
>  	/*
>  	 * While browsing the domains, we released the rq lock, a task could
>  	 * have been enqueued in the meantime. Since we're not going idle,
> @@ -9855,7 +9856,6 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
>  	if (this_rq->cfs.h_nr_running && !pulled_task)
>  		pulled_task = 1;
>  
> -out:
>  	/* Move the next balance forward */
>  	if (time_after(this_rq->next_balance, next_balance))
>  		this_rq->next_balance = next_balance;
> -- 
> 2.7.4
> 
> 
> 
> [snip]
> 

-- 
Regards,
Niklas Söderlund

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ