linux-kernel - Re: sched: softlockups in multi_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMiJ5CW8MUPnRK2y3Trh-ZQDQRPsxaqw=bq9tVgpVVAhFqBzfw@mail.gmail.com>
Date:	Fri, 6 Mar 2015 11:34:38 -0300
From:	Rafael David Tinoco <inaddy@...ntu.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Sasha Levin <sasha.levin@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...emonkey.org.uk>,
	Davidlohr Bueso <dave@...olabs.net>, jason.low2@...com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sched: softlockups in multi_cpu_stop

Are you sure about this ? I have a core dump locked on the same place
(state machine for powering cpu down for the task swap) from a 3.13 (+
upstream patches) and this commit wasn't backported yet.

-> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT);

In my case, curstate is WAY different from enum containing MULTI_STOP_EXIT (4).

Register totally messed up (probably after cpu_relax(), right where
you were trapped -> after the pause instruction).

my case:

PID: 118    TASK: ffff883fd28ec7d0  CPU: 9   COMMAND: "migration/9"
...
    [exception RIP: multi_cpu_stop+0x64]
    RIP: ffffffff810f5944  RSP: ffff883fd2907d98  RFLAGS: 00000246
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000246
    RDX: ffff883fd2907d98  RSI: 0000000000000000  RDI: 0000000000000001
    RBP: ffffffff810f5944   R8: ffffffff810f5944   R9: 0000000000000000
    R10: ffff883fd2907d98  R11: 0000000000000246  R12: ffffffffffffffff
    R13: ffff883f55d01b48  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: 0000000000000001  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff883fd2907d98] multi_cpu_stop+0x64 at ffffffff810f5944

208              } while (curstate != MULTI_STOP_EXIT);
       ---> RIP
RIP 0xffffffff810f5944 <+100>:   cmp    $0x4,%edx
       ---> CHECKING FOR MULTI_STOP_EXIT

RDX: ffff883fd2907d98 -> does not make any sense

###

If i'm reading this right,

"""
CPU 05 - PID 14990

do_numa_page
task_numa_fault
numa_migrate_preferred
task_numa_migrate
migrate_swap (curr: 14990, task: 14996)
stop_two_cpus (cpu1=05(14996), cpu2=00(14990))
wait_for_completion

14990 - CPU05
14996 - CPU00

stop_two_cpus:
    multi_stop_data (msdata->state = MULTI_STOP_PREPARE)
    smp_call_function_single (min=cpu2=00, irq_cpu_stop_queue_work, wait=1)
        smp_call_function_single (ran on lowest CPU, 00 for this case)
        irq_cpu_stop_queue_work
            cpu_stop_queue_work(cpu1=05(14996)) # add work
(multi_cpu_stop) to cpu 05 cpu_stopper queue
            cpu_stop_queue_work(cpu2=00(14990)) # add work
(multi_cpu_stop) to cpu 00 cpu_stopper queue
    wait_for_completion() --> HERE
"""

in my case, checking task structs for tasks scheduled when
"waiting_for_completion()":

PID 14990 CPU 05 -> PID 14996 CPU 00
PID 14991 CPU 30 -> PID 14998 CPU 01
PID 14992 CPU 30 -> PID 14998 CPU 01
PID 14996 CPU 00 -> PID 14992 CPU 30
PID 14998 CPU 01 -> PID 14990 CPU 05

AND

>   102      2   6  ffff881fd2ea97f0  RU   0.0       0      0  [migration/6]
>   118      2   9  ffff883fd28ec7d0  RU   0.0       0      0  [migration/9]
>   143      2  14  ffff883fd29d47d0  RU   0.0       0      0  [migration/14]
>   148      2  15  ffff883fd29fc7d0  RU   0.0       0      0  [migration/15]
>   153      2  16  ffff881fd2f517f0  RU   0.0       0      0  [migration/16]

THEN

I am still waiting for 5 cpu_stopper_thread -> multi_cpu_stop just
scheduled (probably in the per cpu's queue of cpus 0,1,5,30), not
running yet.

AND

I don't have any "wait_for_completion" for those "OLDER" migration
threads (6, 9, 14, 15 and 16)
Probably wait_for_completion signaled done.completion before racing.

Looks like something messed up with curstate in the "multi_cpu_stop"
state machine.

/* Simple state machine */
do {
/* Chill out and ensure we re-read multi_stop_state. */
cpu_relax();

cpu_relax maybe ?

--
Rafael Tinoco

On Fri, Mar 6, 2015 at 9:32 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Sasha Levin <sasha.levin@...cle.com> wrote:
>
>> I've bisected this to "locking/rwsem: Check for active lock before bailing on spinning". Relevant parties Cc'ed.
>
> That would be:
>
>   1a99367023f6 ("locking/rwsem: Check for active lock before bailing on spinning")
>
> attached below.
>
> Thanks,
>
>         Ingo
>
> ===========================>
> From 1a99367023f6ac664365a37fa508b059e31d0e88 Mon Sep 17 00:00:00 2001
> From: Davidlohr Bueso <dave@...olabs.net>
> Date: Fri, 30 Jan 2015 01:14:27 -0800
> Subject: [PATCH] locking/rwsem: Check for active lock before bailing on spinning
>
> 37e9562453b ("locking/rwsem: Allow conservative optimistic
> spinning when readers have lock") forced the default for
> optimistic spinning to be disabled if the lock owner was
> nil, which makes much sense for readers. However, while
> it is not our priority, we can make some optimizations
> for write-mostly workloads. We can bail the spinning step
> and still be conservative if there are any active tasks,
> otherwise there's really no reason not to spin, as the
> semaphore is most likely unlocked.
>
> This patch recovers most of a Unixbench 'execl' benchmark
> throughput by sleeping less and making better average system
> usage:
>
>   before:
>   CPU     %user     %nice   %system   %iowait    %steal     %idle
>   all      0.60      0.00      8.02      0.00      0.00     91.38
>
>   after:
>   CPU     %user     %nice   %system   %iowait    %steal     %idle
>   all      1.22      0.00     70.18      0.00      0.00     28.60
>
> Signed-off-by: Davidlohr Bueso <dbueso@...e.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Acked-by: Jason Low <jason.low2@...com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Michel Lespinasse <walken@...gle.com>
> Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Cc: Tim Chen <tim.c.chen@...ux.intel.com>
> Link: http://lkml.kernel.org/r/1422609267-15102-6-git-send-email-dave@stgolabs.net
> Signed-off-by: Ingo Molnar <mingo@...nel.org>
> ---
>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 1c0d11e8ce34..e4ad019e23f5 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -298,23 +298,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
>  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  {
>         struct task_struct *owner;
> -       bool on_cpu = false;
> +       bool ret = true;
>
>         if (need_resched())
>                 return false;
>
>         rcu_read_lock();
>         owner = ACCESS_ONCE(sem->owner);
> -       if (owner)
> -               on_cpu = owner->on_cpu;
> -       rcu_read_unlock();
> +       if (!owner) {
> +               long count = ACCESS_ONCE(sem->count);
> +               /*
> +                * If sem->owner is not set, yet we have just recently entered the
> +                * slowpath with the lock being active, then there is a possibility
> +                * reader(s) may have the lock. To be safe, bail spinning in these
> +                * situations.
> +                */
> +               if (count & RWSEM_ACTIVE_MASK)
> +                       ret = false;
> +               goto done;
> +       }
>
> -       /*
> -        * If sem->owner is not set, yet we have just recently entered the
> -        * slowpath, then there is a possibility reader(s) may have the lock.
> -        * To be safe, avoid spinning in these situations.
> -        */
> -       return on_cpu;
> +       ret = owner->on_cpu;
> +done:
> +       rcu_read_unlock();
> +       return ret;
>  }
>
>  static inline bool owner_running(struct rw_semaphore *sem,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/