[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874k332wjp.fsf@email.froward.int.ebiederm.org>
Date: Fri, 08 Apr 2022 14:40:42 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Oleg Nesterov <oleg@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org, Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT.
Peter Zijlstra <peterz@...radead.org> writes:
> On Thu, Apr 07, 2022 at 05:50:39PM -0500, Eric W. Biederman wrote:
>> Given that fundamentally TASK_WAKEKILL must be added in ptrace_stop and
>> removed in ptrace_attach I don't see your proposed usage of jobctl helps
>> anything fundamental.
>>
>> I suspect somewhere there is a deep trade-off between complicating
>> the scheduler to have a very special case for what is now
>> TASK_RTLOCK_WAIT, and complicating the rest of the code with having
>> TASK_RTLOCK_WAIT in __state and the values that should be in state
>> stored somewhere else.
>
> The thing is; ptrace is a special case. I feel very strongly we should
> not complicate the scheduler/wakeup path for something that 'never'
> happens.
I was going to comment that I could not understand how the saved_state
mechanism under PREEMPT_RT works. Then I realized that wake_up_process
and wake_up_state call try_to_wake_up which calls ttwu_state_match which
modifies saved_state.
The options appear to be that either ptrace_freeze_traced modifies
__state/state to remove TASK_KILLABLE. Or that something clever happens
in ptrace_freeze_traced that guarantees the task does not wake
up. Something living in kernel/sched/* like wait_task_inactive.
I can imagine adding add a loop around freezable_schedule in
ptrace_stop. That does something like:
do {
freezable_schedule();
} while (current->jobctl & JOBCTL_PTRACE_FREEZE);
Unfortunately after a SIGKILL is delivered the process will never sleep
unless there is a higher priority process to preempt it. So I don't
think that is a viable solution.
What ptrace_freeze_traced and ptrace_unfreeze_traced fundamentally need
is that the process to not do anything interesting, so that the tracer
process can modify the process and it's task_struct.
That need is the entire reason ptrace does questionable things with
with __state.
So if we can do something better perhaps with a rewritten freezer it
would be a general code improvement.
The ptrace code really does want TASK_KILLABLE semantics the entire time
a task is being manipulated by the ptrace system call. The code in
ptrace_unfreeze_traced goes through some gymnastics to detect if a
process was killed while traced (AKA to detect a missed SIGKILL)
and to use wake_up_state to make the task runnable instead of putting
it back in TASK_TRACED.
So really all that is required is a way to ask the scheduler to just
not schedule the process until the ptrace syscall completes and calls
ptrace_unfreeze_traced.
Eric
Powered by blists - more mailing lists