[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR03MB517003D5965F48AC5FE7283DE4C60@AM6PR03MB5170.eurprd03.prod.outlook.com>
Date: Thu, 2 Apr 2020 22:59:59 +0200
From: Bernd Edlinger <bernd.edlinger@...mail.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
On 4/2/20 9:52 PM, Linus Torvalds wrote:
> On Thu, Apr 2, 2020 at 12:31 PM Bernd Edlinger
> <bernd.edlinger@...mail.de> wrote:
>>
>> This is at least what is my impression how the existing mutexes are used,
>> a mutex called "cred_guard_mutex" is a not very good self explaining name,
>> in my opinion, it is totally unclear what it does "guard", and why.
>
> Oh, I absolutely agree that cred_guard_mutex is a horrible lock.
>
> It actually _used_ to be a lot more understandable, and the name used
> to make more sense in the context it was used.
>
> See commit
>
> a2a8474c3fff ("exec: do not sleep in TASK_TRACED under ->cred_guard_mutex")
>
> for when it changed from "somewhat understandable" to "really hard to follow".
>
> Don't get me wrong - that commit has a very good reason for it, but it
> does make the locking really hard to understand.
>
> It all used to be in one function - do_execve() - and it was holding
> the lock over a fairly obvious range, starting at
>
> bprm->cred = prepare_exec_creds();
>
> and ending at basically "we're done with execve()".
>
> So basically, cred_guard_mutex ends up being the thing that is held
> all the way from the "before execve looks at the old creds" to "execve
> is done, and has changed the creds".
>
> The reason it's needed is exactly that there are some nasty situations
> where execve() itself does things with creds to determine that the new
> creds are ok. And it uses the old creds to do that, but it also uses
> the task->flags and task->ptrace.
>
> So think of cred_guard_mutex as a lock around not just the creds, but
> the combination of creds and the task flags/ptrace.
>
> Anybody who changes the task ptrace setting needs to serialize with
> execve(). Or anybody who tests for "dumpable()", for example.
>
> If *all* you care about is just the creds, then you don't need it.
> It's really only users that do more checks than just credentials.
> "dumpable()" is I think the common one.
>
> And that's why cred_guard_mutex has that big range - it starts when we
> read the original creds (because it will use those creds to determine
> how the *new* creds will affect dumpability etc), and it ends when it
> has updated not only to the new creds, but it has set all those other
> flags too.
>
> So I'm not at all against splitting the lock up, and trying to make it
> more directed and specific.
>
> My complaints were about how the new lock wasn't much better. It was
> still completely incomprehensible, the conditional unlocking was hard
> to follow, and it really wasn't obvious that the converted users were
> fine.
>
> See?
>
Understand completely. The change is in a way mechanic, that is we
have the following sequence:
1 execve starts
|
| access args, may fault, deadlock in user mode fault handler
| de_thread, may block waiting for strace to call wait and so
|
| exec_mm_release, may also falut, deadlock un user mode fault handler
v
2 process update begins
|
| should not block, to our current knowledge (except when loading a nfs image probably ?)
| credentials may change at any time.
|
v
3 process update done
|
v
4 execve done
So we have functions that access the process memory map. they use the credentials
and need to access the correct image, not to reveal secrets of the new about to be
loaded image. they need the inner mutex from 2 .. 3
Also when you want to read credentials of another thread, it is probably better
to have a consistent state of the credentials, and no new privs for instance.
That also needs the inner mutex from 2 .. 3
And then we have things that change the credentials or no new privs, in general
all ptrace_attach and security modules are of that kind. They need the
mutex from 1 .. 4, but I want to change the name, in the two patches below, and
I want to break the dead-lock from ptrace in a API-incompatible way, but in a
very limited breaking change, that only breaks what is already broken.
There are two more patches, which might be of interest for you, just to
make the picture complete.
It is not clear if we go that way, or if Eric has a yet better idea.
[PATCH v7 15/16] exec: Fix dead-lock in de_thread with ptrace_attach
https://www.spinics.net/lists/kernel/msg3459067.html
[PATCH v6 16/16] doc: Update documentation of ->exec_*_mutex
https://www.spinics.net/lists/kernel/msg3449539.html
Thanks
Bernd.
> Linus
>
Powered by blists - more mailing lists