[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKCV-6sH03G2xuZrhqEMExx-AAKPZgQ7Z1BnDgV5HimFVGCWwg@mail.gmail.com>
Date: Fri, 21 Nov 2025 15:06:57 -0800
From: Ryan Lee <ryan.lee@...onical.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Bernd Edlinger <bernd.edlinger@...mail.de>, Alexander Viro <viro@...iv.linux.org.uk>,
Alexey Dobriyan <adobriyan@...il.com>, Oleg Nesterov <oleg@...hat.com>, Kees Cook <kees@...nel.org>,
Andy Lutomirski <luto@...capital.net>, Will Drewry <wad@...omium.org>,
Christian Brauner <brauner@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>, Serge Hallyn <serge@...lyn.com>,
James Morris <jamorris@...ux.microsoft.com>, Randy Dunlap <rdunlap@...radead.org>,
Suren Baghdasaryan <surenb@...gle.com>, Yafang Shao <laoar.shao@...il.com>, Helge Deller <deller@....de>,
Adrian Reber <areber@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Jens Axboe <axboe@...nel.dk>,
Alexei Starovoitov <ast@...nel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, linux-kselftest@...r.kernel.org,
linux-mm@...ck.org, linux-security-module@...r.kernel.org,
tiozhang <tiozhang@...iglobal.com>, Luis Chamberlain <mcgrof@...nel.org>,
"Paulo Alcantara (SUSE)" <pc@...guebit.com>, Sergey Senozhatsky <senozhatsky@...omium.org>,
Frederic Weisbecker <frederic@...nel.org>, YueHaibing <yuehaibing@...wei.com>,
Paul Moore <paul@...l-moore.com>, Aleksa Sarai <cyphar@...har.com>,
Stefan Roesch <shr@...kernel.io>, Chao Yu <chao@...nel.org>, xu xin <xu.xin16@....com.cn>,
Jeff Layton <jlayton@...nel.org>, Jan Kara <jack@...e.cz>, David Hildenbrand <david@...hat.com>,
Dave Chinner <dchinner@...hat.com>, Shuah Khan <shuah@...nel.org>,
Elena Reshetova <elena.reshetova@...el.com>, David Windsor <dwindsor@...il.com>,
Mateusz Guzik <mjguzik@...il.com>, Ard Biesheuvel <ardb@...nel.org>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
Hans Liljestrand <ishkamiel@...il.com>, Penglei Jiang <superman.xpt@...il.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Adrian Ratiu <adrian.ratiu@...labora.com>,
Ingo Molnar <mingo@...nel.org>, "Peter Zijlstra (Intel)" <peterz@...radead.org>,
Cyrill Gorcunov <gorcunov@...il.com>, Eric Dumazet <edumazet@...gle.com>,
apparmor <apparmor@...ts.ubuntu.com>
Subject: Re: [RFC][PATCH] exec: Move cred computation under exec_update_lock
On Fri, Nov 21, 2025 at 11:20 AM Eric W. Biederman
<ebiederm@...ssion.com> wrote:
>
> Bernd Edlinger <bernd.edlinger@...mail.de> writes:
>
> > On 11/21/25 10:35, Bernd Edlinger wrote:
> >> On 11/21/25 08:18, Eric W. Biederman wrote:
> >>> Bernd Edlinger <bernd.edlinger@...mail.de> writes:
> >>>
> >>>> Hi Eric,
> >>>>
> >>>> thanks for you valuable input on the topic.
> >>>>
> >>>> On 11/21/25 00:50, Eric W. Biederman wrote:
> >>>>> "Eric W. Biederman" <ebiederm@...ssion.com> writes:
> >>>>>
> >>>>>> Instead of computing the new cred before we pass the point of no
> >>>>>> return compute the new cred just before we use it.
> >>>>>>
> >>>>>> This allows the removal of fs_struct->in_exec and cred_guard_mutex.
> >>>>>>
> >>>>>> I am not certain why we wanted to compute the cred for the new
> >>>>>> executable so early. Perhaps I missed something but I did not see any
> >>>>>> common errors being signaled. So I don't think we loose anything by
> >>>>>> computing the new cred later.
> >>>>>
> >>>>> I should add that the permission checks happen in open_exec,
> >>>>> everything that follows credential wise is just about representing in
> >>>>> struct cred the credentials the new executable will have.
> >>>>>
> >>>>> So I am really at a loss why we have had this complicated way of
> >>>>> computing of computed the credentials all of these years full of
> >>>>> time of check to time of use problems.
> >>>>>
> >>>>
> >>>> Well, I think I see a problem with your patch:
> >>>>
> >>>> When the security engine gets the LSM_UNSAFE_PTRACE flag, it might
> >>>> e.g. return -EPERM in bprm_creds_for_exec in the apparmor, selinux
> >>>> or the smack security engines at least. Previously that callback
> >>>> was called before the point of no return, and the return code should
> >>>> be returned as a return code the the caller of execve. But if we move
> >>>> that check after the point of no return, the caller will get killed
> >>>> due to the failed security check.
> >>>>
> >>>> Or did I miss something?
> >>>
> >>> I think we definitely need to document this change in behavior. I would
> >>> call ending the exec with SIGSEGV vs -EPERM a quality of implementation
> >>> issue. The exec is failing one way or the other so I don't see it as a
> >>> correctness issue.
> >>>
> >>> In the case of ptrace in general I think it is a bug if the mere act of
> >>> debugging a program changes it's behavior. So which buggy behavior
> >>> should we prefer? SIGSEGV where it is totally clear that the behavior
> >>> has changed or -EPERM and ask the debugged program to handle it.
> >>> I lean towards SIGSEGV because then it is clear the code should not
> >>> handle it.
> >>>
> >>> In the case of LSM_UNSAFE_NO_NEW_PRIVS I believe the preferred way to
> >>> handle unexpected things happening is to terminate the application.
> >>>
> >>> In the case of LSM_UNSAFE_SHARE -EPERM might be better. I don't know
> >>> of any good uses of any good uses of sys_clone(CLONE_FS ...) outside
> >>> of CLONE_THREAD.
> >>>
> >>>
> >>> Plus all of these things are only considerations if we are exec'ing a
> >>> program that transitions to a different set of credentials. Something
> >>> that happens but is quite rare itself.
AppArmor's exec rules rely heavily on transitioning to different creds
on exec. For example, an AppArmor policy like
profile example_1 /usr/bin/example_1 {
/usr/bin/example_2 Px -> example_2_profile,
/usr/bin/example_3 Px,
}
will allow binary example_1 to execute binaries example_2 and
example_3, launching those processes under a different confinement
(example_2_profile and a profile that attaches to /usr/bin/example_3,
respectively). We will need to look into how much this patch (or a
corresponding change in behavior) would affect our use case, but
confinement transitions (where the confinement information is stored
as an LSM blob on the cred struct) are extremely common in a system
that uses AppArmor as an LSM.
> >>>
> >>> In practice I don't expect there is anything that depends on the exact
> >>> behavior of what happens when exec'ing a suid executable to gain
> >>> privileges when ptraced. The closes I can imagine is upstart and
> >>> I think upstart ran as root when ptracing other programs so there is no
> >>> gaining of privilege and thus no reason for a security module to
> >>> complain.
> >>>
> >>> Who knows I could be wrong, and someone could actually care. Which is
> >>> hy I think we should document it.>>
> >>
> >>
> >> Well, I dont know for sure, but the security engine could deny the execution
> >> for any reason, not only because of being ptraced.
> >> Maybe there can be a policy which denies user X to execute e.g. any suid programs.
> >>
> >>
> >> Bernd.
> >>
> >
> > Hmm, funny..
> >
> > I installed this patch on top of
> >
> > commit fd95357fd8c6778ac7dea6c57a19b8b182b6e91f (HEAD -> master, origin/master, origin/HEAD)
> > Merge: c966813ea120 7b6216baae75
> > Author: Linus Torvalds <torvalds@...ux-foundation.org>
> > Date: Thu Nov 20 11:04:37 2025 -0800
> >
> > but it does panic when I try to boot:
> >
> > [ 0.870539] TERM=1inux
> > [ 0.870573] Starting init: /bin/sh exists but couldn't execute it (error -14) 0.8705751 Kernel panic- not syncing: No working init found. Try passing i mit= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance
> > [ 0.870577] CPU: UID: 0 PID: 1 Comm: sh Not tainted 6.18.0-rc6+ #1 PREEMPT(voluntary)
> > [ 0.870579] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBo x 12/01/2006
> > [ 0.870580] Call Trace:
> > [ 0.870590] <TASK>
> > [ 0.870592] vpanic+0x36d/0x380
> > [ 0.870607] ? __pfx_kernel_init+0x10/0x10
> > [ 0.870615] panic+0x5b/0x60
> > [ 0.870617] kernel_init+0x17d/0x1c0
> > [ 0.870623] ret_from_fork+0x124/0x150
> > [ 0.870625} ? __pfx_kernel_init+0x10/0x10
> > [ 0.870627] ret_from_fork_asm+0x1a/0x30
> > [ 0.870632] </TASK>
> > [ 0.8706631 Kernel Offset: 0x3a800000 from Oxffffffff81000000 (relocation ran ge: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 0.880034] ---[ end Kernel panic - not syncing: No working init found. Try passing init option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. 1---`
> >
> >
> > Is that a known problem?
>
> Nope. It looks like the code needs a little bit bug fixing testing.
>
> I will take see about taking a look.
>
> Eric
>
I've also CC'ed the AppArmor mailing list on this patch to facilitate
discussion if, upon further investigation, this patch would require
changes or cause other problems on the AppArmor side.
Powered by blists - more mailing lists