[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bkw3y943.ffs@tglx>
Date: Thu, 12 May 2022 00:53:16 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Zhang Qiao <zhangqiao22@...wei.com>,
lkml <linux-kernel@...r.kernel.org>, keescook@...omium.org,
Peter Zijlstra <peterz@...radead.org>, elver@...gle.com,
legion@...nel.org, oleg@...hat.com, brauner@...nel.org
Subject: Re: Question about kill a process group
On Wed, May 11 2022 at 13:33, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@...utronix.de> writes:
>> So unless the number of PIDs for a user is limited this _is_ an
>> unpriviledged DoS vector.
>
> After having slept on this a bit it finally occurred to me the
> semi-obvious solution to this issue is to convert tasklist_lock
> from a rw-spinlock to rw-semaphore. The challenge is finding
> the users (tty layer?) that generate signals from interrupt
> context and redirect that signal generation.
>From my outdated notes where I looked at this before:
[soft]interrupt context which acquires tasklist_lock:
sysrq-e send_sig_all()
sysrq-i send_sig_all()
sysrq-n normalize_rt_tasks()
tasklist_lock nesting into other locks:
fs/fcntl.c: send_sigio(), send_sigurg()
send_sigurg() is called from the network stack ...
Some very obscure stuff in arch/ia64/kernel/mca.c which is called
from a DIE notifier.
Plus quite a bunch of read_lock() instances which nest inside
rcu_read_lock() held sections.
This is probably incomplete, but the scope of the problem has been
greatly reduced vs. the point where I looked at it last time a couple of
years ago. But that's still a herculean task.
> Once signals holding tasklist_lock are no longer generated from
> interrupt context irqs no longer need to be disabled and
> after verifying tasklist_lock isn't held under any other spinlocks
> it can be converted to a semaphore.
Going to take a while. :)
> It won't help the signal delivery times, but it should reduce
> the effect on the rest of the system, and prevent watchdogs from
> firing.
The signal delivery time itself is the least of the worries, but this
still prevents any other operations which require tasklist_lock from
making progress for quite some time, i.e. fork/exec for unrelated
processes/users will have to wait too. So you converted the 'visible'
DoS to an 'invisible' one.
The real problem is that the scope of tasklist_lock is too broad for
most use cases. That does not change when you actually can convert it to
a rwsem. The underlying problem still persists.
Let's take a step back and look what most sane use cases (sysrq-* is not
in that category) require:
Preventing that tasks are added or removed
Do they require that any task is added or removed? No.
They require to prevent add/remove for the intended scope.
That's the thing we need to focus on: reducing the protection scope.
If we can segment the protection for the required scope of e.g. kill(2)
then we still can let unrelated processes/tasks make progress and just
inflict the damage on the affected portion of processes/tasks.
For example:
read_lock(&tasklist_lock);
for_each_process(p) {
if (task_pid_vnr(p) > 1 &&
!same_thread_group(p, current)) {
group_send_sig_info(...., p);
}
}
read_unlock(&tasklist_lock);
same_thread_group() does:
return p->signal == current->signal;
Ideally we can do:
read_lock(&tasklist_lock);
prevent_add_remove(current->signal);
read_unlock(&tasklist_lock);
rcu_read_lock();
for_each_process(p) {
if (task_pid_vnr(p) > 1 &&
!same_thread_group(p, current)) {
group_send_sig_info(...., p);
}
}
rcu_read_unlock();
allow_add_remove(current->signal);
Where prevent_add_remove() sets a state which has to be waited for to be
cleared by anything which wants to add/remove a task in that scope or
change $relatedtask->signal until allow_add_remove() removes that
blocker. I'm sure it's way more complicated, but you get the idea.
If we find a solution to this scope reduction problem, then it will not
only squash the issue which started this discussion. This will have a
benefit in general.
Thanks,
tglx
Powered by blists - more mailing lists