linux-kernel - Re: [PATCH] kernel/exit: do panic earlier to get coredump if global init task exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20191214114810.ftfxtx4qaa5nzji4@wittgenstein>
Date:   Sat, 14 Dec 2019 12:48:11 +0100
From:   Christian Brauner <christian.brauner@...ntu.com>
To:     chenqiwu <qiwuchen55@...il.com>
Cc:     peterz@...radead.org, mingo@...nel.org, kernel-team@...roid.com,
        linux-kernel@...r.kernel.org, chenqiwu@...omi.com, oleg@...hat.com
Subject: Re: [PATCH] kernel/exit: do panic earlier to get coredump if global
 init task exit

On Sat, Dec 14, 2019 at 02:27:04PM +0800, chenqiwu wrote:
> On Thu, Dec 12, 2019 at 12:05:14PM +0100, Christian Brauner wrote:
> > On Thu, Dec 12, 2019 at 11:08:38AM +0100, Oleg Nesterov wrote:
> > > can't you use is_global_init() && group_dead ?
> > 
> > Seems reasonable.
> > Looks like we can move
> > group_dead = atomic_dec_and_test(&tsk->signal->live);
> > further up...
> > 
> > (Ideally I'd like to have a test for this to ensure that this lets you
> > capture a global init coredump but that might be tricky. But since you've
> > seem to have run into this case maybe you even have something that could
> > be turned into a test? (Similar to how we already have a purely opt-in
> > test for pstore.))
> > 
> > Christian
> 
> Hi all,
> I agree that using is_global_init() && group_dead is more reasonable.
> 
> The crash isuee happened on a Android phone by reboot stress test.
> panic log:
> [   84.048521] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   84.048521]
> [   84.048540] CPU: 2 PID: 1 Comm: init Tainted: G S         O    4.14.117-perf-g8035d1a #1
> [   84.048544] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 RAPHAEL (DT)
> [   84.048550] Call trace:
> [   84.048564]  dump_backtrace+0x0/0x268
> [   84.048569]  show_stack+0x14/0x20
> [   84.048577]  dump_stack+0xc4/0x100
> [   84.048584]  panic+0x1f0/0x410
> [   84.048591]  complete_and_exit+0x0/0x20
> [   84.048596]  do_group_exit+0x8c/0xa0
> [   84.048602]  get_signal+0x1c0/0x790
> [   84.048608]  do_notify_resume+0x184/0xc30
> [   84.048613]  work_pending+0x8/0x10
> 
> From the kdump loaded by crash utility, all threads of global init have exited,
> the group_dead value of global init has truned to 0 by atomic_dec_and_test().
> crash> ps init
>    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
> >     1      0   2  ffffffcd77526000  ??   0.0       0      0  init
>     534      1   4  ffffffcd6b9a9000  ZO   0.0       0      0  init
>     535      1   4  ffffffcd6b9aa000  ZO   0.0       0      0  init
> crash> ps -g 1
> PID: 1      TASK: ffffffcd77526000  CPU: 2   COMMAND: "init"
>   (no threads)
> crash> struct task_struct.signal ffffffcd77526000
>   signal = 0xffffffcd77530000
> crash> struct signal_struct 0xffffffcd77530000
> struct signal_struct {
>   sigcnt = {
>     counter = 1
>   },
>   live = {
>     counter = 0
>   },
>   nr_threads = 1,
>   thread_head = {
>     next = 0xffffffcd77526730,
>     prev = 0xffffffcd77526730
>   },
>   group_exit_code = 11,
>   notify_count = 0,
>   group_exit_task = 0x0,
>   group_stop_count = 0,
>   flags = 4,
>   ...
>  }
> 
> However, as Christian said, the test for this is tricky since we must
> make sure all of init threads exited. I make a test for is_global_init()
> and send a SIGSEGV signal to global init task in userspace. The phone
> crash imeddiately and reboot to collect kdump. Then I extract the coredump
> of global init task from kdump successfully.
> (gdb) core-file core.1.init
> Core was generated by `/system/bin/init second_stage'.
> #0  _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> 9           cmn     x0, #(MAX_ERRNO + 1)
> (gdb) bt
> #0  _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> #1  0x00000055606db11c in android::init::InstallRebootSignalHandlers()::$_14::operator()(int) const (this=<optimized out>, signal=11)
>     at system/core/init/reboot_utils.cpp:141
> #2  0x00000055606db100 in android::init::InstallRebootSignalHandlers()::$_14::__invoke(int) (signal=11) at system/core/init/reboot_utils.cpp:138
> #3  0x0000007f8de236a0 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> So from the following test, we have confidence that the following patch can help us

Thanks. Can you please resend this as a proper patch for v2?

Christian