linux-kernel - Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE4VaGAQZcQzN8D+iwcBnP5vY=Ctmbh+oTikvONHir6JjTgpsw@mail.gmail.com>
Date:   Wed, 20 Apr 2022 10:02:20 +0200
From:   Jirka Hladky <jhladky@...hat.com>
To:     Minchan Kim <minchan@...nel.org>
Cc:     tj@...nel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        regressions@...ts.linux.dev,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Justin Forbes <jforbes@...oraproject.org>
Subject: Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on
 dual socket Intel Xeon Gold servers

Hi Minchan,

have you heard back from the kernfs maintainers?

Thank you!
Jirka


On Mon, Apr 4, 2022 at 7:41 PM Minchan Kim <minchan@...nel.org> wrote:
>
> On Fri, Apr 01, 2022 at 02:04:03PM +0200, Jirka Hladky wrote:
> > > Could you decode exact source code line from the oops?
> >
> > Yes - please see below [1].
>
> Thanks.
>
> >
> > > I think it's fine to attach in the reply because kernel test bot
> >
> > OK. The reproducer is attached. Please unpack it and follow the
> > instructions in the README file. [2]
>
> Unfortunately, I failed to run the script in my machine.
>
> >
> > Thanks a lot for looking into it!
> > Jirka
> >
> > [1]
> > =============================================
> > Source code line numbers for the Oops message
> > =============================================
> >
> > 1) RIP: 0010:kernfs_remove+0x8/0x50:
> > (gdb) l *kernfs_remove+0x8
> > 0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
> > 43       * Return the kernfs_root @kn belongs to.
> > 44       */
> > 45      static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > 46      {
> > 47              /* if parent exists, it's always a dir; otherwise, @sd
> > is a dir */
> > 48              if (kn->parent)
> > 49                      kn = kn->parent;
> > 50              return kn->dir.root;
> > 51      }
> >
> > And here are source code lines from the 5 first functions in call trace:
> > [ 8563.366280] Call Trace:
> > [ 8563.366280]  <TASK>
> > [ 8563.366280]  rdt_kill_sb+0x29d/0x350
> > [ 8563.366280]  deactivate_locked_super+0x36/0xa0
> > [ 8563.366280]  cleanup_mnt+0x131/0x190
> > [ 8563.366280]  task_work_run+0x5c/0x90
> > [ 8563.366280]  exit_to_user_mode_prepare+0x229/0x230
> > [ 8563.366280]  syscall_exit_to_user_mode+0x18/0x40
> > [ 8563.366280]  do_syscall_64+0x48/0x90
> > [ 8563.366280]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > 2)(gdb) l *rdt_kill_sb+0x29d
> > 0xffffffff810506bd is in rdt_kill_sb
> > (arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
> > 2437            /* Notify online CPUs to update per cpu storage and
> > PQR_ASSOC MSR */
> > 2438            update_closid_rmid(cpu_online_mask, &rdtgroup_default);
> > 2439
> > 2440            kernfs_remove(kn_info);
> > 2441            kernfs_remove(kn_mongrp);
> > 2442            kernfs_remove(kn_mondata);
> > 2443    }
> >
> > 3)(gdb) l *deactivate_locked_super+0x36
> > 0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
> > 337                     /*
> > 338                      * Since list_lru_destroy() may sleep, we
> > cannot call it from
> > 339                      * put_super(), where we hold the sb_lock.
> > Therefore we destroy
> > 340                      * the lru lists right now.
> > 341                      */
> > 342                     list_lru_destroy(&s->s_dentry_lru);
> > 343                     list_lru_destroy(&s->s_inode_lru);
> > 344
> > 345                     put_filesystem(fs);
> > 346                     put_super(s);
> >
> > 4) (gdb) l *cleanup_mnt+0x131
> > 0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
> > 132             return 0;
> > 133     }
> > 134
> > 135     static void mnt_free_id(struct mount *mnt)
> > 136     {
> > 137             ida_free(&mnt_id_ida, mnt->mnt_id);
> > 138     }
> >
> > 5) (gdb) l *task_work_run+0x5c
> > 0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
> > 2012
> > 2013    DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> > 2014
> > 2015    static __always_inline int _cond_resched(void)
> > 2016    {
> > 2017            return static_call_mod(cond_resched)();
> > 2018    }
> >
> > 6) (gdb) l *exit_to_user_mode_prepare+0x229
> > 0xffffffff81176d19 is in exit_to_user_mode_prepare
> > (./include/linux/tracehook.h:189).
> > 184              * This barrier pairs with
> > task_work_add()->set_notify_resume() after
> > 185              * hlist_add_head(task->task_works);
> > 186              */
> > 187             smp_mb__after_atomic();
> > 188             if (unlikely(current->task_works))
> > 189                     task_work_run();
> > 190
> > 191     #ifdef CONFIG_KEYS_REQUEST_CACHE
> > 192             if (unlikely(current->cached_requested_key)) {
> > 193                     key_put(current->cached_requested_key);
> >
> > [2]
> > =============================================
> > Reproducer - README
> > =============================================
> >
> > 1) HW
> > This issue seems to be platform specific. I was not able to reproduce
> > it on AMD Zen and also not on Intel Ice Lake platform.
> > I see the issue on dual socket Intel Skylake systems. Reproduced on a
> > Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.
>
> Based on your report, kernel was crashed due to kn_mondata was NULL
>
>   rdt_kill_sb
>     rmdir_all_sub
>       ..
>       kernfs_remove(kn_mondata);
>         struct kernfs_root *root = kernfs_root(kn); <-- crashed
>
>
> Before the my patch[1], it worked like this.
>
>   rdt_kill_sb
>     rmdir_all_sub
>       ..
>       kernfs_remove(kn_mondata);
>         down_write(&kernfs_rwsem);
>           if (!kn)
>             return;
>         up_write(&kernfs_rwsem);
>
> IOW, before, kernfs_remove worked with NULL argument via just bailing
> but with the my patch[1], it doesn't work any longer.
>
> It makes me have questions for kernfs maintainers:
>
> Should kernfs_remove API support NULL parameter? If so, can we support
> it atomically without old global kernfs_rwsem?
>
> [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
>


-- 
-Jirka