linux-kernel - Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR03MB5170B4DE6096F78BAB79BF72E4C10@AM6PR03MB5170.eurprd03.prod.outlook.com>
Date:   Thu, 9 Apr 2020 17:15:07 +0200
From:   Bernd Edlinger <bernd.edlinger@...mail.de>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Waiman Long <longman@...hat.com>, Ingo Molnar <mingo@...nel.org>,
        Will Deacon <will@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/9/20 4:58 PM, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@...ux-foundation.org> writes:
> 
>> On Wed, Apr 8, 2020 at 8:17 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>>
>>> Yes.  I missed the fact that we could take the lock killable.
>>> We still unfortunately have the deadlock with ptrace.
>>
>> That, I feel, is similarly trivial.
>>
>> Again, anybody who takes the lock for writing should just do so
>> killably. So you have three cases:
>>
>>  - ptrace wins the race and gets the lock.
>>
>>    Fine, the execve will wait until afterwards.
>>
>>  - ptrace loses the race and is not a thread with execve.
>>
>>    Fine, the execve() won, and the ptrace will wait until after execve.
>>
>>  - ptrace loses the race and is a thread with execve.
>>
>>    Fine, the execve() will kill the thing in dethread() and the ptrace
>> thread will release the lock and die.
> 
> That would be nice.
> 
> That is unfortunately not how ptrace_event(PTRACE_EVENT_EXIT, ...) works.
> 
> When a thread going about it's ordinary business receives the SIGKILL
> from de_thread the thread changes course and finds it's way to do_exit.
> In do_exit the thread calls ptrace_event(PTRACE_EVENT_EXIT, ...) and
> blocks waiting for the tracer to let it continue.
> 
> Further from previous attempts to fix this we know that there
> are pieces of userspace expect that stop to happen.  So if the
> PTRACE_EVENT_EXIT stop does not happen userspace which is already
> attached breaks.
> 
> Further this case with ptrace is something we know userspace
> does and is is just a matter of bad timing of attaching to the
> threads when one thread is exec'ing.  So we don't even need to wonder if
> userspace would do such a silling thing.
> 
> 
> 
> There are a lot similar cases that can happen if userspace inserts
> itself into the path of page faults, directly or indirectly,
> as long as some wait somewhere ultimately waits for a ptrace attach.
> 
> 

Remember, as a last resort there is my "insane" 15/16 patch, which
Linus admittedly hates, but it works.  If we find a cleaner solution
it can always be reverted, that is just fine for me.

Thanks
Bernd.

>> So all three cases are fine, and none of them have any behavioral
>> differences (as mentioned, the killing is "invisible" to users since
>> it's fundamentally a race, and you can consider the kill to have
>> happened before the ptrace started).
> 
> See above.
> 
> 
>>> It might be simpler to make whichever lock we are dealing with per
>>> task_struct instead of per signal_struct.  Then we don't even have to
>>> think about what de_thread does or if the lock is taken killable.
>>
>> Well, yes, but I think the dethread behavior of killing threads is
>> required anyway, so..
> 
> It is, but it is actually part of the problem.
> 
> I think making some of this thread local might solve another easy case
> and let us focus more on the really hard problem.
> 
>>> I keep wondering if we could do something similar to vfork.  That is
>>> allocate an new task_struct and fully set it up for the post exec
>>> process, and then make it visible under tasklist_lock.  Finally we could
>>> free the old process.
>>>
>>> That would appear as if everything happened atomically from
>>> the point of view of the rest of the kernel.
>>
>> I do think that would have been a lovely design originally, and would
>> avoid a lot of things. So "execve()" would basically look like an exit
>> and a thread creation with the same pid (without the SIGCHILD to the
>> parent, obviously)
>>
>> That would also naturally handle the "flush pending signals" etc issues.
>>
>> The fact that we created a whole new mm-struct ended up fixing a lot
>> of problems (even if it was painful to do). This might be similar.
>>
>> But it's not what we've ever done, and I do suspect you'd run into a
>> lot of odd small special cases if we were to try to do it now.
> 
> I completely agree, which is why I haven't been rushing to do that.
> But this remains the only idea that I have thought of that would solve all
> of the issues.
> 
>> So I think it's simpler to just start making the "cred lock waiters
>> have to be killable" rule. It's not like that's a very complex rule.
> 
> I just looked at the remaining users of cred_guard_mutex and they are
> all killable or interruptible.  Further all of the places that have been
> converted to use the exec_update_mutex are also all killable or
> interruptible.
> 
> So where we came in is that we had the killable rule and that has what
> has allowed this to remain on the backburner for so long.  At least you
> could kill the affected process from userspace.   Unfortunately the
> deadlocks still happen.
> 
> Eric
>