lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 11 Sep 2010 09:42:16 -0700
From:	Venkatesh Pallipadi <venki@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH] generic-ipi: fix deadlock in __smp_call_function_single

On Sat, Sep 11, 2010 at 2:20 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, 2010-09-10 at 17:28 -0700, Andrew Morton wrote:
>> Where is this scheduler bug?  Did it occur because someone didn't
>> understand __smp_call_function_single()?  Or did it occur because the
>> scheduler code is doing something which its implementors did not expect
>> or intend?
>
>
> It comes from 83cd4fe2 (sched: Change nohz idle load balancing logic to
> push model), where nohz_balance_kick() simply needs to kick the
> designated driver into action.
>
> I take it Venki assumed __smp_call_function_single() works like
> smp_call_function_single() where you can use it for the local cpu as
> well.

Yes. This was an oversight while moving from using send_remote_softirq
to using __smp_call_function_single.
Also, as we don't have rq lock around this point, it seems possible
that the CPU that was busy and wants to kick idle load balance on
remote CPU, could have become idle and nominated itself as idle load
balancer.

Below patch looks good to me.

Acked-by: Venkatesh Pallipadi <venki@...gle.com>

I guess, we also need a WARN_ON_ONCE for (cpu == smp_processor_id())
in __smp_call_function_single(), as the eventual result of this bug
that Heiko saw was a deadlock

Thanks,
Venki

>
> I guess we could do something like the below as well, which would be
> slightly faster since we don't actually need to call raise_softirq()
> since we already set it for self a bit earlier in order to have it do
> the regular load-balance actions.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> ---
>  kernel/sched_fair.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9b5b4f8..c8ca1cb 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3404,11 +3404,13 @@ static void nohz_balancer_kick(int cpu)
>        }
>
>        if (!cpu_rq(ilb_cpu)->nohz_balance_kick) {
> -               struct call_single_data *cp;
> -
>                cpu_rq(ilb_cpu)->nohz_balance_kick = 1;
> -               cp = &per_cpu(remote_sched_softirq_cb, cpu);
> -               __smp_call_function_single(ilb_cpu, cp, 0);
> +
> +               if (ilb_cpu != cpu) {
> +                       struct call_single_data *cp;
> +                       cp = &per_cpu(remote_sched_softirq_cb, cpu);
> +                       __smp_call_function_single(ilb_cpu, cp, 0);
> +               }
>        }
>        return;
>  }
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ