lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160921152946.GA24210@dhcp22.suse.cz>
Date:   Wed, 21 Sep 2016 17:29:46 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     strace-devel@...ts.sourceforge.net,
        Oleg Nesterov <oleg@...hat.com>, Aleksa Sarai <asarai@...e.com>
Subject: strace lockup when tracing exec in go

[I am CCing strace mailing list because even if this turns out to be a
 kernel bug strace seems to be doing something unexpected - more on that
 below]

Hi,
Aleksa has reported the following lockup when stracing the following go
program

% cat exec.go
package main

import (
    "os"
    "syscall"
)

func main() {
    syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
}
$ go version
go version go1.6.3 linux/amd64
$ go build -o exec exec.go

$ strace -f ./exec
[snip]
[pid 10349] select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 10346] <... select resumed> )      = 0 (Timeout)
[pid 10346] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] <unfinished ...>
[pid 10346] <... select resumed> )      = 0 (Timeout)
[pid 10349] <... select resumed> )      = 0 (Timeout)

execve will never finish unless the strace process get killed with
SIGKILL. The following was debugged on 3.12 kernel and the current
kernel seems to not trigger the issue as easily but I believe the same
problem is there as well.

The further investigation shown that the tracer (strace) is stuck
waiting for cred_guard_mutex
[<0000000000000000>] mm_access+0x22/0xa0
[<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0
[<0000000000000000>] process_vm_rw+0xab/0x110
[<0000000000000000>] SyS_process_vm_readv+0x15/0x20
[<0000000000000000>] system_call_fastpath+0x16/0x1b
[<0000000000000000>] 0x7f186f031c3a
[<0000000000000000>] 0xffffffffffffffff

while the traced process (3 threads) are trying to perform the exec.
That means that 2 threads are dead (in fact zombies) waiting in their
final schedule.
Call Trace:
 [<ffffffff81057c17>] do_exit+0x6f7/0xa70
 [<ffffffff81058009>] do_group_exit+0x39/0xa0
 [<ffffffff810670d8>] get_signal_to_deliver+0x1e8/0x5c0
 [<ffffffff81002342>] do_signal+0x42/0x670
 [<ffffffff810029e8>] do_notify_resume+0x78/0xc0
 [<ffffffff8151f4c2>] int_signal+0x12/0x17
 [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8

and one is
Call Trace:
 [<ffffffff811a80bf>] flush_old_exec+0xdf/0x890
 [<ffffffff811f4197>] load_elf_binary+0x307/0xda0
 [<ffffffff811a7b8e>] search_binary_handler+0xae/0x1f0
 [<ffffffff811a917e>] do_execve_common.isra.26+0x64e/0x810
 [<ffffffff811a95c1>] SyS_execve+0x31/0x50
 [<ffffffff8151f7b9>] stub_execve+0x69/0xa0
 [<00007f3a33f16527>] 0x7f3a33f16526

waiting for notify_count to drop down to 0
        while (sig->notify_count) {
                __set_current_state(TASK_KILLABLE);
                spin_unlock_irq(lock);
                schedule(); <<<<<<<<<
                if (unlikely(__fatal_signal_pending(tsk)))
                        goto killed;
                spin_lock_irq(lock);
        }

this however doesn't happen because both threads which are dead
are zombies waiting to be reaped by the parent and to call their
release_task->__exit_signal. The tracer is blocked on the lock held by
exec (in prepare_bprm_creds). This is the case in the current kernel as
well AFAICS so the same should be possible as well. So is this a bug or
something somewhere else makes sure that this will not happen in newer
kernels?

Now, I've tried to fix this by
diff --git a/kernel/fork.c b/kernel/fork.c
index 5a57b9bab85c..d5b7c3aea187 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 	struct mm_struct *mm;
 	int err;
 
-	err =  mutex_lock_killable(&task->signal->cred_guard_mutex);
+	err =  mutex_lock_interruptible(&task->signal->cred_guard_mutex);
 	if (err)
 		return ERR_PTR(err);
 
but it didn't help and I really couldn't see why until I sprinkled
printks all over the exit path and to learn that the parent (strace)
doesn't get notified about SIGCHLD when those threads are dying. Why is that?
exit_notify
  do_notify_parent -> return false
    __group_send_sig_info
      send_signal
        __send_signal
          prepare_signal -> fails because the signal is ignored

this is more than unexpected (at least to me) because I would expect
that the strace wouldn't ignore SIGCHLD! Is this a bug in strace?

Btw. in a successful (not hung) case the tracer is in do_wait
when do_notify_parent->__wake_up_parent does the right thing.

Thanks!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ