[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87blo45keg.fsf@x220.int.ebiederm.org>
Date: Mon, 06 Apr 2020 17:17:27 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Waiman Long <longman@...hat.com>, Ingo Molnar <mingo@...nel.org>,
Will Deacon <will@...nel.org>,
Bernd Edlinger <bernd.edlinger@...mail.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> [ For Waiman & co - the problem is that the current cred_guard_mutex
> is horrendous and has problems with execve() deadlocking against
> various users. We've had this bug before, there's a new one, it's just
> nasty ]
>
> On Thu, Apr 2, 2020 at 4:04 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> That is not the direction I intend to take either.
>>
>> I was hoping I could put off replying to this thread for a bit because
>> I only managed to get 4 hours of sleep last night and I am not as alert
>> to technical details as I would like to be.
>
> Hmm.. So I've been looking at this cred_guard_mutex, and I wonder...
>
> This is a bit hand-wavy, because I haven't walker through all the
> paths, but could we perhaps work around a lot of the problems a
> different way., namely:
>
> - make the "cred_guard_mutex" an rwsem-like thing instead of being a mutex.
>
> - make the ptrace_attach() case get it for writing - not because
> ptrace changes the creds, but because ptrace changes 'task->ptrace'
> and depends on dumpability etc.
>
> - change the *name* of that damn thing. Not because it's now
> rwsem'ish rather than a mutex, but because it was never really about
> just "creds". It was about creds+ptrace+dumpable flags etc.
>
> - make all the ones that read the creds to just take it for reading
> (IOW, the cases that were basically switched over to
> exec_update_mutex).
>
> - and finally: make "execve()" take it just for reading too, but
> introduce a "upgrade to write" at the very end (when it actually is
> all done and then finally changes the creds and dumpability)
>
> Wouldn't that solve all problems? We wouldn't get deadlocks wrt
> execve(), simply because execve() doesn't need it to be writable, and
> the things execve() does and can deadlock all only want readability.
>
> But hear me out, because the above is fundamentally broken in a couple
> of ways, so let me address that brokenness before you tell me I'm a
> complete nincompoop and an idiot.
>
> I'm including some locking people here because of these issues, so
> that they can maybe verify my thinking.
>
> (a) our rwsem's are fair
>
> So the whole "execve takes it for reading, so now others can take
> it for reading too without deadlocks" is simply not true - if you use
> the existing rwsem.
>
> Because a concurrent (blocked) writer will then block other
> readers for fairness reasons, and holding it for reading doesn't
> guarantee that others can get it for reading.
>
> So clearly, the above doesn't even *fix* the deadlocks - unless
> we have an unfair mode (or just a special lock for just this that is
> not our standard rwsem, but a special unfair one).
>
> So I'm suggesting we use a special unfair rwsem here (we can make
> a simple spinlock-based one - it doesn't need to be as clever or
> optimized as the real rwsems are)
>
> (b) similarly, our rwsem's don't actually have a "upgrade from read
> to write", because that's also a fundamentally deadlocky operation.
>
> Again, that's true. Except execve() is special, and we know
> there's only _one_ execve() at a time that will complete, since we're
> serializing them. So for this particular use, "upgrade to write" would
> be possible without the general-case deadlock issues.
>
> (c) I didn't think things through, and even with these special
> semantics, my idea is complete garbage
>
> Ok, this may well be true.
>
> Anyway, the advantage of this (if it works) is that it would allow us
> to go back to the _really_ simple original model of just taking this
> lock for reading at the beginning of execve(), and not worrying so
> much about complex nesting or very complex rules for exactly when we
> got the lock and error handling.
>
> The final part when we actually update the credentials and dumpability
> and stuff in execve() is actually fairly simple. So the "upgrade to a
> write lock" phase doesn't worry me too much. It's the interaction
> with all the previous parts (which happen with it held just for
> reading) that tend to be the nastier ones.
>
> And ptrace_attach() really is special, and I think it would be the
> only one that really needs that write lock.
>
> The disadvantage, of course, is that it would require that
> special-case lock semantic, and I might also be missing some thing
> that makes it not work anyway.
>
> Comments? Am I just dreaming of a simpler model without my medications
> again?
Withough reading everything through at least.
* There is also security_setprocattr which needs ptrace and nnp state not to
change it needs to set something that at least selinux's cred
calculations needs to remain constant (like nnp and ptrace).
Which means one thread calling security_setprocattr and another thread
calling exec can deadlock in de_thread.
* Even with your lock and just the ptrace case I can deadlock.
Ptracer: Thread A Tread B
ptrace_attach A
exec
ptrace_attach B
uprade R to RW
---------------------- DEADLOCKED -------------------------
Those are the first two cases I have thought of. There are probably
more.
But fundamentally the only reason we need this information stable
before the point of no return is so that we can return a nice error
code to the process calling exec. Instead of terminating the
process with SIGSEGV.
These are for the most part unlikely scenarios or people would have been
complaining much more loudly about deadlock.
So my plan is to perform the relevant calculations effectively twice.
Once just before the point of no return, and give a graceful return code
if necessary and possible. Once just afer the point of no return, and
SIGSEGV if necessary.
Of course this all only applies to LSMs that refuse to continue under
NNP or ptrace without changing the cred. Linux without those LSMs
enabled will just continue with the original credentials.
So I don't think we will noticably be sacraficing the quality of
the user experience with my plan. In the worst case a deadlock
will become a SIGSEGV killing the execing program.
Eric
p.s. Yes we can do better than a mutex that makes everything mutually
exclusive. I am just starting there for simplicity, and to
see if we need anything better. Unfortuantely too many things are
changing simultaneously for rcu to cover all of the read side
cases.
Powered by blists - more mailing lists