lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a61baa42fbb5a5d7bcc167589c856e34b68b431c.camel@redhat.com>
Date:   Thu, 12 Nov 2020 14:41:34 -0500
From:   Qian Cai <cai@...hat.com>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>, tglx@...utronix.de,
        mingo@...nel.org, linux-kernel@...r.kernel.org,
        bigeasy@...utronix.de, qais.yousef@....com, swood@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vincent.donnefort@....com,
        tj@...nel.org, ouwen210@...mail.com
Subject: Re: [PATCH v4 10/19] sched: Fix migrate_disable() vs
 set_cpus_allowed_ptr()

On Thu, 2020-11-12 at 19:31 +0000, Valentin Schneider wrote:
> One thing I don't get: that trace shows refcount_dec_and_test()
> (kernel/sched/core.c:2263) happening before the wait_for_completion(). It's
> not the case in the below trace.

Yes, that is normal. Sometimes, the decoding is a bit off not sure because of
some debugging options like KASAN obscures it.

> a) Do you also get this on CONFIG_PREEMPT=y?

I don't know. None of the systems here has that, but I could probably try.

> b) Could you try the below?

Let me run it and report.

> 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 02076e6d3792..fad0a8e62aca 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1923,7 +1923,7 @@ static int migration_cpu_stop(void *data)
>  		else
>  			p->wake_cpu = dest_cpu;
>  
> -	} else if (dest_cpu < 0) {
> +	} else if (dest_cpu < 0 || pending) {
>  		/*
>  		 * This happens when we get migrated between migrate_enable()'s
>  		 * preempt_enable() and scheduling the stopper task. At that
> @@ -1933,6 +1933,17 @@ static int migration_cpu_stop(void *data)
>  		 * more likely.
>  		 */
>  
> +		/*
> +		 * The task moved before the stopper got to run. We're holding
> +		 * ->pi_lock, so the allowed mask is stable - if it got
> +		 * somewhere allowed, we're done.
> +		 */
> +		if (pending && cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
> +			p->migration_pending = NULL;
> +			complete = true;
> +			goto out;
> +		}
> +
>  		/*
>  		 * When this was migrate_enable() but we no longer have an
>  		 * @pending, a concurrent SCA 'fixed' things and we should be
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ