linux-kernel - Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87imih8md8.fsf@x220.int.ebiederm.org>
Date:   Thu, 02 Apr 2020 19:05:23 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Bernd Edlinger <bernd.edlinger@...mail.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <gladkov.alexey@...il.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Thu, Apr 2, 2020 at 4:04 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> That is not the direction I intend to take either.
>
> Ahh, good. Because those kinds of "play games with dropping locks in
> the middle based on conditionals" really have been horrible.
>
> Yes, we've done it, and it's almost always been asource of truly subtle bugs.
>
>> The exec_update_mutex is to be a strict subset of today's
>> cred_guard_mutex.  I tried to copy cred_guard_mutex's unlock style so
>> that was obvious and that turns out was messier than I intended.
>
> Yes. That is why I had no problem pulling that subset, and my worries
> were mainly about the explanations and that flag use.
>
>> Since I thought I already knew what was in the patches and the worst
>> problem was the missing unlock of cred_guard_mutex, and I know Bernd's
>> patches had been tested I applied them.  I missed that Bernd had added
>> the exec_mmap_called flag into my patch.  I thought he had only added
>> the missing unlock.
>
> Ahh, so you meant for all of that to be entirely static refactoring,
> rather than the conditional unlock depending on just how far we had
> gotten.
>
> Good, that's generally the much superior approach.

Looking at it right now if I add the unlock to one code path I can
get the flag and free_binprm out of it, and have it completely static.

> I absolutely _hate_ the "drop and retake" model, unless it's a very
> local case with a very explicit retry path.
>
> In contrast, the "we have a flag that shows how far we've gotten"
> _has_ been a successful model, and while I much prefer a static "lock
> pairs with unlock", that "I have done this, so you need to unlock" is
> not entirely out of the question when the static rules become too
> complex to think about.

We do definitely need one of those for the point of no return.  I
need to check if we can set it sooner.  I think we have a weird case
where we can't set the flag because calling force_sigsegv during when
coredumping is rude.

> The vfs code has something similar in FMODE_OPENED which is basically
> a flag saying "I actually made it all the way to the ->open()"
> callback. We used to have a static model, but the rules for when we
> can use fput(), and when we have to use fdrop() were too hard for
> people.
>
>> I spotted the weirdness in unlocking exec_update_mutex, and because it
>> does fix a real world deadlock with ptrace I did not back it out from my
>> tree.
>>
>> I have been much laxer on the details than I like to be my apologies.
>
> Ok, as long as we have a sane plan..
>
> And
>
>> The plan is:
>>         exec_udpate_mutex will cover just the subset of cred_guard_mutex
>>         after the point of no return, and after we do any actions that
>>         might block waiting for userspace to do anything.
>>
>>         So exec_update_mutex will just cover those things that exec
>>         is updating, so if you want an atomic snapshot of them
>>         plus the appropriate struct cred you can grab exec_update_mutex.
>>
>>         I added a new mutex instead of just fixing cred_guard_mutex so
>>         that we can update or revert the individual code paths if it
>>         is found that something is wrong.
>>
>>         The cred_guard_mutex also prevents other tasks from starting
>>         to ptrace the task that is exec'ing, and other tasks from
>>         setting no_new_privs on the task that is exec'ing.
>>
>>         My plan is to carefully refactor the code so it can perform
>>         both the ptrace and no_new_privs checks after the point of
>>         no return.
>
> Ok. Sounds good.
>
>> I have uncovered all kinds of surprises while working in that direction
>> and I realize it is going to take a very delicate and careful touch to
>> achieve my goal.
>>
>> There are silly things like normal linux exec when you are ptraced and
>> exec changes the credentials the ordinary code will continue with the
>> old credentials, but the an LSM enabled your process is likely to be
>> killed instead.
>
> Yeah. The "continue with old credentials" is actually very traditional
> and the original behavior, and is useful for handling the case of
> debugging something that is suid, but doesn't necessarily _require_
> it.

If we continue with old credentials I think we are still setting
AT_SECURE which seems odd.  Especially since it doesn't appear to be
intentional.

> But the LSM's just say yes/no.

Oh I wish what the LSM's were doing was anything approaching as
simple as merely saying yes/no during exec.

> I have this dim memory that it also triggers when you do the debugging
> as root, but that may be some medication-induced memory.

I have a memory of someone fixing something like that years ago.
If you have sufficient privileges while ptracing the current code will
allow you to trace a suid root exec.

>> There is the personal mind blowing scenario where selinux will increase
>> your credentials upon exec but if a magic directive is supplied in it's
>> rules will avoid setting AT_SECURE, so that userspace won't protect
>> itself from hostile takeover from the pre credential change environment.
>> Much to my surprise "noatsecure" is a known and documented feature of
>> selinux.  I am not certain but I think I even spotted it in use on
>> production.
>
> We have had a _ton_ of random small rules so that people could enable
> SElinux in legacy environments.
>
> They are _probably_ effectively dead code in this day and age, but
> it's hard to tell...

For that specific case I attempted to look at the SELinux rules
file on a production RHEL7 configuration and strings told me
"noatsecure" is present.  But the whole thing is a binary blob
and I have not spent enough time to figure out how to properly
return it to test so I can see what that "noatsecure" means.
It might just have been a section header in the binary file.

There are a lot of things like that are either going to need comments
from the relevant maintiners or to just be avoided for the time being.

The plan is small careful patches so I get through it with setting
off the minimal number of landmines.

Eric