linux-kernel - Re: Question about kill a process group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <874k2mtny7.fsf@email.froward.int.ebiederm.org>
Date:   Thu, 21 Apr 2022 11:12:48 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Zhang Qiao <zhangqiao22@...wei.com>
Cc:     lkml <linux-kernel@...r.kernel.org>, <keescook@...omium.org>,
        <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>,
        <elver@...gle.com>, <legion@...nel.org>, <oleg@...hat.com>,
        <brauner@...nel.org>
Subject: Re: Question about kill a process group

Zhang Qiao <zhangqiao22@...wei.com> writes:

> 在 2022/4/13 23:47, Eric W. Biederman 写道:
>> To do something about this is going to take a deep and fundamental
>> redesign of how we maintain process lists to handle a parent
>> with millions of children well.
>> 
>> Is there any real world reason to care about this case?  Without
>> real world motivation I am inclined to just note that this is
>
> I just foune it while i ran ltp test.

So I looked and fork12 has been around since 2002 in largely it's
current form.  So I am puzzled why you have run into problems
and other people have not.

Did you perhaps have lock debugging enabled?

Did you run on a very large machine where a ridiculous number processes
could be created?

Did you happen to run fork12 on a machine where locks are much more
expensive than on most machines?

>> Is there a real world use case that connects to this?
>> 
>> How many children are being created in this test?  Several million?
>
>   There are about 300,000+ processes.

Not as many as I was guessing, but still enough to cause a huge
wait on locks.

>> I would like to blame this on the old issue that tasklist_lock being
>> a global lock.  Given the number of child processes (as many as can be
>> created) I don't think we are hurt much by using a global lock.  The
>> problem for solubility is that we have a lock.
>> 
>> Fundamentally there must be a lock taken to maintain the parent's
>> list of children.
>> 
>> I only see SIGQUIT being called once in the parent process so that
>> should not be an issue.
>
>
>   In fork12, every child will call kill(0, SIGQUIT) at cleanup().
> There are a lot of kill(0, SIGQUIT) calls.

I had missed that.  I can see that stressing out a lot.

At the same time as I read fork12.c that is very much a bug.  The
children in fork12.c should call _exit() instead of exit().  Which
would suppress calling the atexit() handlers and let fork12.c
test what it is trying to test.

That doesn't mean there isn't a mystery here, but more that if
we really want to test lots of processes calling the same
signal at the same time it should be a test that means to do that.

>> There is a minor issue in fork12 that it calls exit(0) instead of
>> _exit(0) in the children.  Not the problem you are dealing with
>> but it does look like it can be a distraction.
>> 
>> I suspect the issue really is the thundering hurd of a million+
>> processes synchronizing on a single lock.
>> 
>> I don't think this is a hard lockup, just a global slow down.
>> I expect everything will eventually exit.
>> 
>
>  But according to the vmcore, this is a hardlockup issue, and i think
> there may be the following scenarios:

Let me rewind a second.  I just realized that I don't have a clue what
a hard lockup is (outside of the linux hard lockup detector).

The two kinds of lockups that I understand with a technical meaning are
deadlock (such taking two locks in opposite orders which can never be
escaped), and livelock (where things are so busy no progress is made for
an extended period of time).

I meant to say this is not a deadlock situation.  This looks like a
livelock, but I think given enough time the code would make progress and
get out of it.

I do agree over 1 second for holding a spin lock is ridiculous and a
denial of service attack.

What I unfortunately do not see is a real world scenario where this will
happen.  Without a real world scenario it is hard to find motivation to
spend the year or so it would take to rework all of the data structures.
The closest I can imagine to a real world scenario is that this
situation can be used as a denial of service attack.

The hardest part of the problem is that signals sent to a group need to
be sent to the group atomically.  That is the signals need to be sent to
every member of the group.

Anyway I am very curious why you are the only one seeing a problem with
fork12.  That we can definitely investigate as tracking down what is
different about your setup versus other people who have run ltp seems
much easier than redesigning all of the signal processing data
structures from scratch.

Eric