linux-kernel - Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ee7501c6-d996-1684-1652-f0f838ba69c3@i-love.sakura.ne.jp>
Date:   Thu, 16 May 2019 17:19:12 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Liu Chuansheng <chuansheng.liu@...el.com>,
        Valdis Kletnieks <valdis.kletnieks@...edu>,
        linux-kernel@...r.kernel.org, Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.

On 2019/05/15 19:55, Petr Mladek wrote:
>> +	if (!stamp) {
>> +		stamp = jiffies;
>> +		if (!stamp)
>> +			stamp++;
>> +		t->killed_time = stamp;
>> +		return;
>> +	}
> 
> I might be too dumb but the above code looks pretty tricky to me.
> It would deserve a comment. Or better, I would remove
> trick to handle overflow. If it happens, we would just
> lose one check period.

We can use

  static inline unsigned long jiffies_nonzero(void)
  {
      const unsigned long stamp = jiffies;

      return stamp ? stamp : -1;
  }

or even shortcut "jiffies | 1" because difference by one jiffie
is an measurement error for multiple HZ of timeout.

> 
> Alternative solution would be to set the timestamp in
> complete_signal(). Then we would know that the timestamp
> is always valid when a fatal signal is pending.

Yes. But I guess that since signal might be delivered just before
setting PF_FROZEN and the thread might be kept frozen for longer
than timeout, we will need to reset the timestamp just before
clearing PF_FROZEN.

>> +	if (time_is_after_jiffies(stamp + timeout * HZ))
>> +		return;
>> +	trace_sched_process_hang(t);
>> +	if (sysctl_hung_task_panic) {
>> +		console_verbose();
>> +		hung_task_call_panic = true;
> 
> IMHO, the delayed task exit is much less fatal than sleeping
> in an uninterruptible state.
> 
> Anyway, the check is much less reliable. In case of hung_task,
> it is enough when the task gets scheduled. In the new check,
> the task has to do some amount of work until the signal
> gets handled and do_exit() is called.
> 
> The panic should either get enabled separately or we should
> never panic in this case.

OK, we should not share existing sysctl settings.

But in the context of syzbot's testing where there are only 2 CPUs
in the target VM (which means that only small number of threads and
not so much memory) and threads get SIGKILL after 5 seconds from fork(),
being unable to reach do_exit() within 10 seconds is likely a sign of
something went wrong. For example, 6 out of 7 trials of a reproducer for
https://syzkaller.appspot.com/bug?id=835a0b9e75b14b55112661cbc61ca8b8f0edf767
resulted in "no output from test machine" rather than "task hung".
This patch is revealing that such killed threads are failing to reach
do_exit() because they are trapped at unkillable retry loop due to a
race bug.

Therefore, I would like to try this patch in linux-next.git for feasibility
testing whether this patch helps finding more bugs and reproducers for such
bugs, by bringing "unable to terminate threads" reports out of "no output from
test machine" reports. We can add sysctl settings before sending to linux.git.

Any questions?