[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE4VaGAQZcQzN8D+iwcBnP5vY=Ctmbh+oTikvONHir6JjTgpsw@mail.gmail.com>
Date: Wed, 20 Apr 2022 10:02:20 +0200
From: Jirka Hladky <jhladky@...hat.com>
To: Minchan Kim <minchan@...nel.org>
Cc: tj@...nel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
regressions@...ts.linux.dev,
Thorsten Leemhuis <regressions@...mhuis.info>,
Justin Forbes <jforbes@...oraproject.org>
Subject: Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on
dual socket Intel Xeon Gold servers
Hi Minchan,
have you heard back from the kernfs maintainers?
Thank you!
Jirka
On Mon, Apr 4, 2022 at 7:41 PM Minchan Kim <minchan@...nel.org> wrote:
>
> On Fri, Apr 01, 2022 at 02:04:03PM +0200, Jirka Hladky wrote:
> > > Could you decode exact source code line from the oops?
> >
> > Yes - please see below [1].
>
> Thanks.
>
> >
> > > I think it's fine to attach in the reply because kernel test bot
> >
> > OK. The reproducer is attached. Please unpack it and follow the
> > instructions in the README file. [2]
>
> Unfortunately, I failed to run the script in my machine.
>
> >
> > Thanks a lot for looking into it!
> > Jirka
> >
> > [1]
> > =============================================
> > Source code line numbers for the Oops message
> > =============================================
> >
> > 1) RIP: 0010:kernfs_remove+0x8/0x50:
> > (gdb) l *kernfs_remove+0x8
> > 0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
> > 43 * Return the kernfs_root @kn belongs to.
> > 44 */
> > 45 static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > 46 {
> > 47 /* if parent exists, it's always a dir; otherwise, @sd
> > is a dir */
> > 48 if (kn->parent)
> > 49 kn = kn->parent;
> > 50 return kn->dir.root;
> > 51 }
> >
> > And here are source code lines from the 5 first functions in call trace:
> > [ 8563.366280] Call Trace:
> > [ 8563.366280] <TASK>
> > [ 8563.366280] rdt_kill_sb+0x29d/0x350
> > [ 8563.366280] deactivate_locked_super+0x36/0xa0
> > [ 8563.366280] cleanup_mnt+0x131/0x190
> > [ 8563.366280] task_work_run+0x5c/0x90
> > [ 8563.366280] exit_to_user_mode_prepare+0x229/0x230
> > [ 8563.366280] syscall_exit_to_user_mode+0x18/0x40
> > [ 8563.366280] do_syscall_64+0x48/0x90
> > [ 8563.366280] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > 2)(gdb) l *rdt_kill_sb+0x29d
> > 0xffffffff810506bd is in rdt_kill_sb
> > (arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
> > 2437 /* Notify online CPUs to update per cpu storage and
> > PQR_ASSOC MSR */
> > 2438 update_closid_rmid(cpu_online_mask, &rdtgroup_default);
> > 2439
> > 2440 kernfs_remove(kn_info);
> > 2441 kernfs_remove(kn_mongrp);
> > 2442 kernfs_remove(kn_mondata);
> > 2443 }
> >
> > 3)(gdb) l *deactivate_locked_super+0x36
> > 0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
> > 337 /*
> > 338 * Since list_lru_destroy() may sleep, we
> > cannot call it from
> > 339 * put_super(), where we hold the sb_lock.
> > Therefore we destroy
> > 340 * the lru lists right now.
> > 341 */
> > 342 list_lru_destroy(&s->s_dentry_lru);
> > 343 list_lru_destroy(&s->s_inode_lru);
> > 344
> > 345 put_filesystem(fs);
> > 346 put_super(s);
> >
> > 4) (gdb) l *cleanup_mnt+0x131
> > 0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
> > 132 return 0;
> > 133 }
> > 134
> > 135 static void mnt_free_id(struct mount *mnt)
> > 136 {
> > 137 ida_free(&mnt_id_ida, mnt->mnt_id);
> > 138 }
> >
> > 5) (gdb) l *task_work_run+0x5c
> > 0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
> > 2012
> > 2013 DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> > 2014
> > 2015 static __always_inline int _cond_resched(void)
> > 2016 {
> > 2017 return static_call_mod(cond_resched)();
> > 2018 }
> >
> > 6) (gdb) l *exit_to_user_mode_prepare+0x229
> > 0xffffffff81176d19 is in exit_to_user_mode_prepare
> > (./include/linux/tracehook.h:189).
> > 184 * This barrier pairs with
> > task_work_add()->set_notify_resume() after
> > 185 * hlist_add_head(task->task_works);
> > 186 */
> > 187 smp_mb__after_atomic();
> > 188 if (unlikely(current->task_works))
> > 189 task_work_run();
> > 190
> > 191 #ifdef CONFIG_KEYS_REQUEST_CACHE
> > 192 if (unlikely(current->cached_requested_key)) {
> > 193 key_put(current->cached_requested_key);
> >
> > [2]
> > =============================================
> > Reproducer - README
> > =============================================
> >
> > 1) HW
> > This issue seems to be platform specific. I was not able to reproduce
> > it on AMD Zen and also not on Intel Ice Lake platform.
> > I see the issue on dual socket Intel Skylake systems. Reproduced on a
> > Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.
>
> Based on your report, kernel was crashed due to kn_mondata was NULL
>
> rdt_kill_sb
> rmdir_all_sub
> ..
> kernfs_remove(kn_mondata);
> struct kernfs_root *root = kernfs_root(kn); <-- crashed
>
>
> Before the my patch[1], it worked like this.
>
> rdt_kill_sb
> rmdir_all_sub
> ..
> kernfs_remove(kn_mondata);
> down_write(&kernfs_rwsem);
> if (!kn)
> return;
> up_write(&kernfs_rwsem);
>
> IOW, before, kernfs_remove worked with NULL argument via just bailing
> but with the my patch[1], it doesn't work any longer.
>
> It makes me have questions for kernfs maintainers:
>
> Should kernfs_remove API support NULL parameter? If so, can we support
> it atomically without old global kernfs_rwsem?
>
> [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
>
--
-Jirka
Powered by blists - more mailing lists