linux-kernel - [question] panic() during reboot -f (reboot syscall)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190306132938.hzyb7gee5actx3l3@pathway.suse.cz>
Date:   Wed, 6 Mar 2019 14:29:38 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     linux-kernel@...r.kernel.org
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J. Wysocky" <rafael.j.wysocki@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        linux-ext4@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Andy Shevchenko <andy.shevchenko@...il.com>,
        Peter Zijlstra <peterz@...radead.org>, Jan Kara <jack@...e.cz>
Subject: [question] panic() during reboot -f (reboot syscall)

Hello,

I wonder if it is "normal" to get panic() when the system is rebooted
using "reboot -f". I looks a bit weird to me.

In our case, the panic() was triggered from ext4 filesystem code
that was mounted with "errors=panic"

  crash> bt
  PID: 3984   TASK: ffff887db1f6c180  CPU: 32  COMMAND: "bash"
  #0 [ffff887e637bf9a8] machine_kexec at ffffffff81059c5c
  #1 [ffff887e637bf9f8] __crash_kexec at ffffffff81119e0a
  #2 [ffff887e637bfab8] panic at ffffffff81193c31
  #3 [ffff887e637bfb30] ext4_handle_error at ffffffffa0229faa [ext4]
  #4 [ffff887e637bfb40] __ext4_error_inode at ffffffffa022a12a [ext4]
  #5 [ffff887e637bfbe0] __ext4_get_inode_loc at ffffffffa02096a5 [ext4]
  #6 [ffff887e637bfc40] ext4_iget at ffffffffa020c028 [ext4]
  #7 [ffff887e637bfcc0] ext4_lookup at ffffffffa0216ca0 [ext4]
  #8 [ffff887e637bfce8] lookup_real at ffffffff81218e3f
  #9 [ffff887e637bfd00] __lookup_hash at ffffffff8121916f
  #10 [ffff887e637bfd20] walk_component at ffffffff8121b50f
  #11 [ffff887e637bfd70] path_lookupat at ffffffff8121ca30
  #12 [ffff887e637bfd98] filename_lookup at ffffffff8121e58c
  #13 [ffff887e637bfe98] vfs_fstatat at ffffffff81214549
  #14 [ffff887e637bfed8] SYSC_newstat at ffffffff812149ca
  #15 [ffff887e637bff50] entry_SYSCALL_64_fastpath at ffffffff8161de61
      RIP: 00007f9db8d3ebe5  RSP: 00007ffda081cf68  RFLAGS: 00000246
      RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f9db8d3ebe5
      RDX: 00000000013c7fa0  RSI: 00000000013c7fa0  RDI: 00000000013c7f40
      RBP: 00007f9db943bee0   R8: 00000000013c7f40   R9: 00000000000b0000
      R10: 000000007af2c337  R11: 0000000000000246  R12: 00000000013c7fa0
      R13: 00000000013c7fa0  R14: 0000000000000008  R15: 00000000013c7f80
      ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b


Now, "reboot -f" just calls the reboot() syscall. I do not see
anything that would stop processes. It even does not stop
other CPUs by purpose, see the commit cf7df378aa4ff7da
("reboot: rigrate shutdown/reboot to boot cpu").

But it shuts down devices very early, via:

  + kernel_restart()
    + kernel_restart_prepare()
      + blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
      + device_shutdown()

As a result, processes are still running. Filesystem code return
errors because the underlaying disk device was removed. It causes
panic() because "errors=panic" mount option.


My undestanding that userspace is reponsible for "clean" reboot.
The "reboot" command normally stops services, kill processes,
sync disks, umount filesystem before it calls the "reboot"
syscall.

By other words. It looks like the panic() is possible by design.
But it looks a bit weird. Any opinion?

Best Regards,
Petr