lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877d5ajesi.fsf@email.froward.int.ebiederm.org>
Date:   Tue, 21 Jun 2022 09:02:05 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Alexander Gordeev <agordeev@...ux.ibm.com>
Cc:     linux-kernel@...r.kernel.org, rjw@...ysocki.net,
        Oleg Nesterov <oleg@...hat.com>, mingo@...nel.org,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, mgorman@...e.de, bigeasy@...utronix.de,
        Will Deacon <will@...nel.org>, tj@...nel.org,
        linux-pm@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
        Richard Weinberger <richard@....at>,
        Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        Johannes Berg <johannes@...solutions.net>,
        linux-um@...ts.infradead.org, Chris Zankel <chris@...kel.net>,
        Max Filippov <jcmvbkbc@...il.com>,
        linux-xtensa@...ux-xtensa.org, Kees Cook <keescook@...omium.org>,
        Jann Horn <jannh@...gle.com>, linux-ia64@...r.kernel.org
Subject: Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED,
 TASK_STOPPED state

Alexander Gordeev <agordeev@...ux.ibm.com> writes:

> On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> From: Peter Zijlstra <peterz@...radead.org>
>> 
>> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> state exists only in task->__state and nowhere else.
>> 
>> There's two spots of bother with this:
>> 
>>  - PREEMPT_RT has task->saved_state which complicates matters,
>>    meaning task_is_{traced,stopped}() needs to check an additional
>>    variable.
>> 
>>  - An alternative freezer implementation that itself relies on a
>>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>>    result in misbehaviour.
>> 
>> As such, add additional state to task->jobctl to track this state
>> outside of task->__state.
>> 
>> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> 
>> --EWB
>>   * didn't add a unnecessary newline in signal.h
>>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>>     instead of in signal_wake_up_state.  This prevents the clearing
>>     of TASK_STOPPED and TASK_TRACED from getting lost.
>>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>
> Hi Eric, Peter,
>
> On s390 this patch triggers warning at kernel/ptrace.c:272 when
> kill_child testcase from strace tool is repeatedly used (the source
> is attached for reference):
>
> while :; do
> 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> done
>
> It normally takes few minutes to cause the warning in -rc3, but FWIW
> it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>
> Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> observe a child in __TASK_TRACED state. Could you please comment here?
>

For clarity the warning is that the child is not in __TASK_TRACED state.

The code is waiting for the code to stop in the scheduler in the
__TASK_TRACED state so that it can safely read and change the
processes state.  Some of that state is not even saved until the
process is scheduled out so we have to wait until the process
is stopped in the scheduler.

At least on s390 it looks like there is a race between SIGKILL and
ptrace_check_attach.  That isn't good.

Reading the code below there is something missing because I don't see
anything making ptrace calls, and ptrace_check_attach (which contains
the warning) only happens in the ptrace syscall.

Eric



> Thanks!
>
> /*
>  * Check for the corner case that previously lead to segfault
>  * due to an attempt to access unitialised tcp->s_ent.
>  *
>  * 13994 ????( <unfinished ...>
>  * ...
>  * 13994 <... ???? resumed>) = ?
>  *
>  * Copyright (c) 2019 The strace developers.
>  * All rights reserved.
>  *
>  * SPDX-License-Identifier: GPL-2.0-or-later
>  */
>
> #include "tests.h"
>
> #include <sched.h>
> #include <signal.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
>
> #define ITERS    10000
> #define SC_ITERS 10000
>
> int
> main(void)
> {
> 	volatile sig_atomic_t *const mem =
> 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> 	if (mem == MAP_FAILED)
> 		perror_msg_and_fail("mmap");
>
> 	for (unsigned int i = 0; i < ITERS; ++i) {
> 		mem[0] = mem[1] = 0;
>
> 		const pid_t pid = fork();
> 		if (pid < 0)
> 			perror_msg_and_fail("fork");
>
> 		if (!pid) {
> 			/* wait for the parent */
> 			while (!mem[0])
> 				;
> 			/* let the parent know we are running */
> 			mem[1] = 1;
>
> 			for (unsigned int j = 0; j < SC_ITERS; j++)
> 				sched_yield();
>
> 			pause();
> 			return 0;
> 		}
>
> 		/* let the child know we are running */
> 		mem[0] = 1;
> 		/* wait for the child */
> 		while (!mem[1])
> 			;
>
> 		if (kill(pid, SIGKILL))
> 			perror_msg_and_fail("kill");
> 		if (wait(NULL) != pid)
> 			perror_msg_and_fail("wait");
> 	}
>
> 	return 0;
> }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ