lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 3 Apr 2020 17:09:07 +0200
From:   Bernd Edlinger <bernd.edlinger@...mail.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

On 4/2/20 9:04 PM, Linus Torvalds wrote:
> On Wed, Apr 1, 2020 at 9:16 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> The work on exec starts solving a long standing issue with exec that
>> it takes mutexes of blocking userspace applications, which makes exec
>> extremely deadlock prone.  For the moment this adds a second mutex
>> with a narrower scope that handles all of the easy cases.  Which
>> makes the tricky cases easy to spot.  With a little luck the code to
>> solve those deadlocks will be ready by next merge window.
> 
> So this worries me.
> 
> I've pulled it, but I'm not entirely happy about some of it.
> 
> For example, the "rationale" for some of the changes is
> 
>     This should be safe, as the credentials are only used for reading.
> 
> which is just nonsensical. "Only used for reading" is immaterial, and
> there's no explanation for why that would matter at all. Most of the
> credentials are ever used for reading, and the worry about execve() is
> that the credentials can change, and people race with them and use the
> new 'suid' credentials and allow things that shouldn't be allowed. So
> the rationale makes no sense at all.
> 
> Btw, if "this only takes it for reading" is such a big deal, why is
> that mutex not an rw-semaphore?
> 
> The pidfd change at least has a rationale that makes sense:
> 
>     This should be safe, as the credentials do not change
>     before exec_update_mutex is locked.  Therefore whatever
>     file access is possible with holding the cred_guard_mutex
>     here is also possbile with the exec_update_mutex.
> 
> so now you at least have an explanation of why that particular lock
> makes sense and works and is equivalent.
> 
> It's still not a *great* explanation for why it would be equivalent,
> because cred_guard_mutex ends up not just guarding the write of the
> credentials, but makes it atomic wrt *other* data. That's the same
> problem as "I'm only reading".
> 
> Locking is not about *one* value being consistent - if that was the
> case, then you could just do a "get refcount on the credentials, now I
> have a stable set of creds I can read forever". No lock needed.
> 
> So locking is about *multiple* values being consistent, which is why
> that "I'm only reading" is not an explanation for why you can change
> the lock.
> 
> It's also why that second one is questionable: it's a _better_ attempt
> at explaining things, but the point is really that cred_guard_mutex
> protects *other* things too.
> 

Can we still edit the change logs, maybe that is a clear indication
that they are not sufficiently clear, when one don't understand the
patch without following the whole email thread.


> A real explanation would have absolutely *nothing* to do with
> "reading" or "same value of credentials". Both of those are entirely
> immaterial, since - as mentioned - you could just get a snapshot of
> the creds instead.
> 

The problem we have here is that *another* thread can change no_new_privs
of the execve thread, that is a write.  I think that must be avoided
whatever it costs.  Those are the hard issues,
and reading another thread's credentials, an taking a reference of the
vm need to be consistent, so should just not happen while the vm
is updated, but the credentials not yet.

Or am I missing something here?

> A real explanation would be about how there is no other state that
> cred_guard_mutex protects that matters.
> 
> See what I'm saying?
> 
> This code is subtle as h*ll, and we've had bugs in it, and it has a
> series of tens of patches to fix them. But that also means that the
> explanations for the patches should take the subtleties into account,
> and not gloss over them with things like this.
> 

:-)

> Ok, enough about the explanations. The actual _code_ is kind of odd
> too. For example, you have that "bprm->called_exec_mmap" flag to say
> "I've taken the exec_update_mutex, and need to drop it".
> 

previously that was bprm->mm == NULL, that is even more hacky.

> But that flag is not set anywhere _near_ actually taking the lock.
> Sure, it is taken after exec_mmap() returns successfully, and that
> makes sense from a naming standpoint, but wouldn't it have been a
> _lot_ more obvious if you just set the flag when you took that lock,
> and instead of naming it by some magical code sequence, you named it
> for what it does?
> 

Linus, I take full responsibility for this part of the patch.
In this case, I just did not want to change the name again.
That name was in a previous version of my patch, that I merged
with Eric's and at the same time had to fix the mutex-lock-order
issue in Eric's original patch.  But if anybody would have
suggested a better name, and advised me to pass a parameter to
exec_mmap that would have happened.
So a kind of laziness on my side, and unfortunately I forgot to
point to all the changes in a revision log, I usually do that,
but this time I forgot it somehow.  This was a 16-part patch
series at that time, so I just was really busy with following
each mail of the previous patch version, and also get the latest
revision of the change log (I use the mail maybe I should have
pulled Eric's tree, but I am still a newbie here ... :-) ).
Anyhow I was surprised that Eric did not see my changes by
looking at them, but that is the human nature, nothing to be
blamed for.


> Again, this looks all technically correct, but it's written in a way
> that doesn't seem to make a lot of sense. Why is the code literally
> written with a magical assumption of "calling exec_mmap takes this
> lock, so if the flag named called_exec_mmap is set, I have to free
> that lock that is not named that at all".
> 

Names can be changed.  In the peer review everybody was happy with it.
But that is not set in stone.

Initially I only wanted to address the ptrace attach, but Eric
came up with the user mode page fault handler, that made the patch
a lot more complicated, if that goal is dropped, also the place
where the mutex need to be taken could be a different one.


> I hate conditional locking in the first place, but if it has to exist,
> then the conditional should be named after the lock, and the lock
> getting should be very very explicitly tied to it.
> 
> Wouldn't it have been much clearer if you called that flag
> "exec_update_mutex_taken", and set it WHEN YOU TAKE IT?
> 

Can be done.  I don't care.  It is one additional register taken
with a parameter to exec_mmap and it is probably inlined, nothing
more nothing less.


> In fact, then you could drop the
> 
>                         mutex_unlock(&tsk->signal->exec_update_mutex);
> 
> in the error case of exec_mmap(), because now the error handling in
> free_bprm() would do the cleanup automatically.
> 

The error handling is sometimes called when the exec_update_mutex is
not taken, in fact even de_thread not called.

Can you say how you would suggest that to be done?


> See what I'm saying? You've made the locking more complex and subtle
> than it needed to be. And since the whole point of the *new* lock is
> that it should replace an old lock that was really complex and subtle,
> that's a problem.
> 
>                    Linus
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ