linux-kernel - Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR03MB5170B606F9AC663225EC9609E4C60@AM6PR03MB5170.eurprd03.prod.outlook.com>
Date:   Thu, 2 Apr 2020 21:31:20 +0200
From:   Bernd Edlinger <bernd.edlinger@...mail.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/2/20 9:04 PM, Linus Torvalds wrote:
> On Wed, Apr 1, 2020 at 9:16 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> The work on exec starts solving a long standing issue with exec that
>> it takes mutexes of blocking userspace applications, which makes exec
>> extremely deadlock prone.  For the moment this adds a second mutex
>> with a narrower scope that handles all of the easy cases.  Which
>> makes the tricky cases easy to spot.  With a little luck the code to
>> solve those deadlocks will be ready by next merge window.
> 
> So this worries me.
> 
> I've pulled it, but I'm not entirely happy about some of it.
> 
> For example, the "rationale" for some of the changes is
> 
>     This should be safe, as the credentials are only used for reading.
> 

What I meant, but did probably not find a good way to say it.

There are places where credentials of other threads are written,
e.g. set no new privs on a thread, that already started to execve
a setuid process.

You always have the right to change the credentials of the own thread,
you dont need a mutex for it.

This is at least what is my impression how the existing mutexes are used,
a mutex called "cred_guard_mutex" is a not very good self explaining name,
in my opinion, it is totally unclear what it does "guard", and why.


Bernd.

> which is just nonsensical. "Only used for reading" is immaterial, and
> there's no explanation for why that would matter at all. Most of the
> credentials are ever used for reading, and the worry about execve() is
> that the credentials can change, and people race with them and use the
> new 'suid' credentials and allow things that shouldn't be allowed. So
> the rationale makes no sense at all.
> 
> Btw, if "this only takes it for reading" is such a big deal, why is
> that mutex not an rw-semaphore?
> 
> The pidfd change at least has a rationale that makes sense:
> 
>     This should be safe, as the credentials do not change
>     before exec_update_mutex is locked.  Therefore whatever
>     file access is possible with holding the cred_guard_mutex
>     here is also possbile with the exec_update_mutex.
> 
> so now you at least have an explanation of why that particular lock
> makes sense and works and is equivalent.
> 
> It's still not a *great* explanation for why it would be equivalent,
> because cred_guard_mutex ends up not just guarding the write of the
> credentials, but makes it atomic wrt *other* data. That's the same
> problem as "I'm only reading".
> 
> Locking is not about *one* value being consistent - if that was the
> case, then you could just do a "get refcount on the credentials, now I
> have a stable set of creds I can read forever". No lock needed.
> 
> So locking is about *multiple* values being consistent, which is why
> that "I'm only reading" is not an explanation for why you can change
> the lock.
> 
> It's also why that second one is questionable: it's a _better_ attempt
> at explaining things, but the point is really that cred_guard_mutex
> protects *other* things too.
> 
> A real explanation would have absolutely *nothing* to do with
> "reading" or "same value of credentials". Both of those are entirely
> immaterial, since - as mentioned - you could just get a snapshot of
> the creds instead.
> 
> A real explanation would be about how there is no other state that
> cred_guard_mutex protects that matters.
> 
> See what I'm saying?
> 
> This code is subtle as h*ll, and we've had bugs in it, and it has a
> series of tens of patches to fix them. But that also means that the
> explanations for the patches should take the subtleties into account,
> and not gloss over them with things like this.
> 
> Ok, enough about the explanations. The actual _code_ is kind of odd
> too. For example, you have that "bprm->called_exec_mmap" flag to say
> "I've taken the exec_update_mutex, and need to drop it".
> 
> But that flag is not set anywhere _near_ actually taking the lock.
> Sure, it is taken after exec_mmap() returns successfully, and that
> makes sense from a naming standpoint, but wouldn't it have been a
> _lot_ more obvious if you just set the flag when you took that lock,
> and instead of naming it by some magical code sequence, you named it
> for what it does?
> 
> Again, this looks all technically correct, but it's written in a way
> that doesn't seem to make a lot of sense. Why is the code literally
> written with a magical assumption of "calling exec_mmap takes this
> lock, so if the flag named called_exec_mmap is set, I have to free
> that lock that is not named that at all".
> 
> I hate conditional locking in the first place, but if it has to exist,
> then the conditional should be named after the lock, and the lock
> getting should be very very explicitly tied to it.
> 
> Wouldn't it have been much clearer if you called that flag
> "exec_update_mutex_taken", and set it WHEN YOU TAKE IT?
> 
> In fact, then you could drop the
> 
>                         mutex_unlock(&tsk->signal->exec_update_mutex);
> 
> in the error case of exec_mmap(), because now the error handling in
> free_bprm() would do the cleanup automatically.
> 
> See what I'm saying? You've made the locking more complex and subtle
> than it needed to be. And since the whole point of the *new* lock is
> that it should replace an old lock that was really complex and subtle,
> that's a problem.
> 
>                    Linus
>