linux-kernel - Re: [syzbot] [integrity] [overlayfs] general protection fault in d

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9bee07bd-5e72-0a82-c106-f9718f7942a8@linux.ibm.com>
Date:   Thu, 21 Sep 2023 10:29:05 -0400
From:   Stefan Berger <stefanb@...ux.ibm.com>
To:     Christian Brauner <brauner@...nel.org>,
        Mimi Zohar <zohar@...ux.ibm.com>
Cc:     Jeff Layton <jlayton@...nel.org>,
        Casey Schaufler <casey@...aufler-ca.com>,
        syzbot <syzbot+a67fc5321ffb4b311c98@...kaller.appspotmail.com>,
        amir73il@...il.com, linux-fsdevel@...r.kernel.org,
        linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-unionfs@...r.kernel.org, miklos@...redi.hu,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [integrity] [overlayfs] general protection fault in
 d_path


On 9/21/23 07:48, Christian Brauner wrote:
> On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
>> On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
>>> On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
>>>> On 9/20/2023 5:10 PM, Stefan Berger wrote:
>>>>> On 9/20/23 18:09, Stefan Berger wrote:
>>>>>> On 9/20/23 17:16, Jeff Layton wrote:
>>>>>>> On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
>>>>>>>> On 9/20/23 13:01, Stefan Berger wrote:
>>>>>>>>> On 9/17/23 20:04, syzbot wrote:
>>>>>>>>>> syzbot has bisected this issue to:
>>>>>>>>>>
>>>>>>>>>> commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
>>>>>>>>>> Author: Jeff Layton <jlayton@...nel.org>
>>>>>>>>>> Date:   Mon Apr 17 16:55:51 2023 +0000
>>>>>>>>>>
>>>>>>>>>>        IMA: use vfs_getattr_nosec to get the i_version
>>>>>>>>>>
>>>>>>>>>> bisection log:
>>>>>>>>>> https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
>>>>>>>>>> start commit:   a747acc0b752 Merge tag
>>>>>>>>>> 'linux-kselftest-next-6.6-rc2'
>>>>>>>>>> of g..
>>>>>>>>>> git tree:       upstream
>>>>>>>>>> final oops:
>>>>>>>>>> https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
>>>>>>>>>> console output:
>>>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
>>>>>>>>>> kernel config:
>>>>>>>>>> https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
>>>>>>>>>> dashboard link:
>>>>>>>>>> https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
>>>>>>>>>> syz repro:
>>>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
>>>>>>>>>> C reproducer:
>>>>>>>>>> https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
>>>>>>>>>>
>>>>>>>>>> Reported-by: syzbot+a67fc5321ffb4b311c98@...kaller.appspotmail.com
>>>>>>>>>> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
>>>>>>>>>> i_version")
>>>>>>>>>>
>>>>>>>>>> For information about bisection process see:
>>>>>>>>>> https://goo.gl/tpsmEJ#bisection
>>>>>>>>> The final oops shows this here:
>>>>>>>>>
>>>>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000058
>>>>>>>>> #PF: supervisor read access in kernel mode
>>>>>>>>> #PF: error_code(0x0000) - not-present page
>>>>>>>>> PGD 0 P4D 0
>>>>>>>>> Oops: 0000 [#1] PREEMPT SMP
>>>>>>>>> CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
>>>>>>>>> 6.4.0-rc2-syzkaller #0
>>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>>> BIOS Google 08/04/2023
>>>>>>>>> RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
>>>>>>>>> Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
>>>>>>>>> 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
>>>>>>>>> <49> 8b
>>>>>>>>> 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
>>>>>>>>> RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
>>>>>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
>>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
>>>>>>>>> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
>>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>>>>>> R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
>>>>>>>>> FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
>>>>>>>>> knlGS:0000000000000000
>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> Call Trace:
>>>>>>>>>    <TASK>
>>>>>>>>>    lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
>>>>>>>>>    seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
>>>>>>>>>    get_fs_root_rcu fs/d_path.c:243 [inline]
>>>>>>>>>    d_path+0xd1/0x1f0 fs/d_path.c:285
>>>>>>>>>    audit_log_d_path+0x65/0x130 kernel/audit.c:2139
>>>>>>>>>    dump_common_audit_data security/lsm_audit.c:224 [inline]
>>>>>>>>>    common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
>>>>>>>>>    smack_log+0xad/0x130 security/smack/smack_access.c:383
>>>>>>>>>    smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
>>>>>>>>>    smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
>>>>>>>>>    security_inode_getattr+0x32/0x50 security/security.c:2114
>>>>>>>>>    vfs_getattr+0x1b/0x40 fs/stat.c:167
>>>>>>>>>    ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
>>>>>>>>>    ima_check_last_writer security/integrity/ima/ima_main.c:171
>>>>>>>>> [inline]
>>>>>>>>>    ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
>>>>>>>>>    __fput+0xc7/0x220 fs/file_table.c:315
>>>>>>>>>    task_work_run+0x7d/0xa0 kernel/task_work.c:179
>>>>>>>>>    exit_task_work include/linux/task_work.h:38 [inline]
>>>>>>>>>    do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
>>>>>>>>>    do_group_exit+0x85/0xa0 kernel/exit.c:1021
>>>>>>>>>    get_signal+0x73c/0x7f0 kernel/signal.c:2874
>>>>>>>>>    arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
>>>>>>>>>    exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
>>>>>>>>>    exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
>>>>>>>>>    __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>>>>>>>>    syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
>>>>>>>>>    do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
>>>>>>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> do_exit has called exit_fs(tsk) [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
>>>>>>>>>
>>>>>>>>> exit_fs(tsk) has set tsk->fs = NULL [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> I think this then bites in d_path() where it calls:
>>>>>>>>>
>>>>>>>>>       get_fs_root_rcu(current->fs, &root);   [
>>>>>>>>> https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
>>>>>>>>>
>>>>>>>>> current->fs is likely NULL here.
>>>>>>>>>
>>>>>>>>> If this was correct it would have nothing to do with the actual
>>>>>>>>> patch,
>>>>>>>>> though, but rather with the fact that smack logs on process
>>>>>>>>> termination. I am not sure what the solution would be other than
>>>>>>>>> testing for current->fs == NULL in d_path before using it and
>>>>>>>>> returning an error that is not normally returned or trying to
>>>>>>>>> intercept this case in smack.
>>>>>>>> I have now been able to recreate the syzbot issue with the test
>>>>>>>> program
>>>>>>>> and the issue is exactly the one described here, current->fs == NULL.
>>>>>>>>
>>>>>>> Earlier in this thread, Amir had a diagnosis that IMA is
>>>>>>> inappropriately
>>>>>>> trying to use f_path directly instead of using the helpers that are
>>>>>>> friendly for stacking filesystems.
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@mail.gmail.com/
>>>>>>>
>>>>>>>
>>>>>>> I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
>>>>>>> someone on the IMA team could try this approach?
>>>>>>
>>>>>> I have applied this patch here from Amir now and it does NOT resolve
>>>>>> the issue:
>>>>>>
>>>>>> https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@linux.ibm.com/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
>>>>>>
>>>>>>
>>>>> This seems to resolve the issue:
>>>>>
>>>>> diff --git a/security/smack/smack_access.c
>>>>> b/security/smack/smack_access.c
>>>>> index 585e5e35710b..57afcea1e39b 100644
>>>>> --- a/security/smack/smack_access.c
>>>>> +++ b/security/smack/smack_access.c
>>>>> @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
>>>>> *object_label, int request,
>>>>>          struct smack_audit_data *sad;
>>>>>          struct common_audit_data *a = &ad->a;
>>>>>
>>>>> +       if (current->flags & PF_EXITING)
>>>>> +               return;
>>>>> +
>>>> Based on what I see here I can understand that this prevents the panic,
>>>> but it isn't so clear what changed that introduced the problem.
>>>>
>>>>>          /* check if we have to log the current event */
>>>>>          if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
>>>>>                  return;
>>>>>
>>>>>
>>> Apparently, it's this patch:
>>>
>>>      db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
>> Yes, the syzbot was updated with that info.
>>
>>> At one time, IMA would reach directly into the inode to get the
>>> i_version and ctime. That was fine for certain filesystems, but with
>>> more recent changes it needs to go through ->getattr instead. Evidently,
>>> it's selecting the wrong inode to query when dealing with overlayfs and
>>> that's causing panics at times.
>>>
>>> As to why the above patch helps, I'm not sure, but given that it doesn't
>>> seem to change which inode is being queried via getattr, it seems like
>>> this is probably papering over the real bug. That said, IMA and
>>> overlayfs are not really in my wheelhouse, so I could be very wrong
>>> here.
>> The call to vfs_getattr_nosec() somehow triggers a call to
>> security_inode_getattr().  Without the call neither ovl_getattr() nor
>> smack_inode_getattr() would be called.
> ima_file_free()
> -> ima_check_last_writer()
>     -> vfs_getattr_nosec()
>        -> i_op->getattr() == ovl_getattr()
>           -> vfs_getattr()
>              -> security_inode_getattr()
> 	    -> real_i_op->getattr()
>
> is the callchain that triggers this.
>
> ima_file_free() is called in a very sensitive location: __fput() that
> can be called from task work when the process is already PF_EXITING.
>
> The ideal solution would be for ima to stop calling back into the
> filesystems in this location at all but that's probably not going to
> happen because I now realize you also set extended attributes from
> __fput():
>
>
> ima_check_last_writer()
> -> ima_update_xatt()
>     -> ima_fix_xattr()
>        -> __vfs_setxattr_noperm()
>
> The __vfs_setxattr_noperm() codepath can itself trigger
> security_inode_post_setxattr() and security_inode_setsecurity(). So
> those hooks are hopefully safe to be called with PF_EXITING tasks as
> well...

LSM inode_post_setxattr has two users, Smack and SELinux. Smack does not 
call smack_log/audit in this case. SELinux seems safe as well.

LSM inode_setsecurity has two users, Smack and SELinux. Smack does not 
call smack_log/audit in this case. SELinux seems safe as well.


>
> Imho, this is all very wild but I'm not judging.
>
> Two solutions imho:
> (1) teach stacking filesystems like overlayfs and ecryptfs to use
>      vfs_getattr_nosec() in their ->getattr() implementation when they
>      are themselves called via vfs_getattr_nosec(). This will fix this by
>      not triggering another LSM hook.
> (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
>      anything

Re (2): LSM's getattr should still make policy decision, right? So 
callers should be allowed to go to some depth into these functions but 
avoid calls deeper into smack_log (for example) that trigger the call path
   common_lsm_audit
    -> dump_common_audit_data
     -> audit_log_d_path via a->type LSM_AUDIT_DATA_PATH or 
LSM_AUDIT_DATA_FILE or  LSM_AUDIT_DATA_IOCTL_OP and then possibly run 
into current->fs == NULL in d_path.

To avoid audit_log_d_path being called the a->type should not be either 
one of the 3 mentioned above:

LSM_AUDIT_DATA_PATH: used by Smack & SELinux
LSM_AUDIT_DATA_FILE: used by SELinux
LSM_AUDIT_DATA_IOCTL: used by SELinux


The LSM getattr has users AppArmor, SELinux, Smack and Tomoyo.

Tomoyo: seems safe
SELinux:

selinux_inode_getattr
   -> path_has_perm  [ sets LSM_AUDIT_DATA_FILE !! ]
     -> inode_has_perm
      -> avc_has_perm
       -> avc_audit
         -> slow_avc_audit
           -> common_lsm_audit
            -> dump_common_audit_data
             -> audit_log_d_path
              -> d_path
               -> get_fs_root_rcu with current->fs = NULL

Smack: (the known path per this syzbot issue)

smack_inode_getattr   [ sets LSM_AUDIT_DATA_PATH !! ]
   -> smk_curacc
     -> smk_tskacc
       -> smack_log
        -> common_lsm_audit
          -> dump_common_audit_data
            -> audit_log_d_path
              -> d_path
               -> get_fs_root_rcu with current->fs = NULL

AppArmor:

apparmor_inode_getattr
   -> common_perm_cond
      -> common_perm
        -> aa_path_perm
           -> profile_path_perm
               -> aa_path_perm
                  -> aa_audit_file      [ sets LSM_AUDIT_DATA_TASK ]
                     -> aa_audit
                      -> aa_audit_msg
                        -> common_lsm_audit
                         -> dump_common_audit_data
                           -> DOES NOT call audit_log_d_path but calls 
task_tgid_nr(tsk)


So, SELinux and Smack would be affected. The common code path starts 
with common_lsm_audit and either a check for current->flags & PF_EXITING 
or a more fine-grained one like this here could be a solution for (2)

diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 849e832719e2..4f3570322851 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -445,6 +445,18 @@ void common_lsm_audit(struct common_audit_data *a,

         if (a == NULL)
                 return;
+
+       if (current->flags & PF_EXITING) {
+               /*
+                * Avoid running into audit_log_d_path -> d_path
+                * -> get_fs_root_rcu with current->fs = NULL
+                */
+               if (a->type == LSM_AUDIT_DATA_PATH ||
+                   a->type == LSM_AUDIT_DATA_FILE ||
+                   a->type == LSM_AUDIT_DATA_IOCTL_OP)
+                       return;
+       }
+
         /* we use GFP_ATOMIC so we won't sleep */
         ab = audit_log_start(audit_context(), GFP_ATOMIC | __GFP_NOWARN,
                              AUDIT_AVC);