lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <de51e96e-6878-082e-32a9-9e51ce671e3f@huawei.com>
Date:   Mon, 28 Oct 2019 09:47:07 +0800
From:   Zhihao Cheng <chengzhihao1@...wei.com>
To:     Valentin Schneider <valentin.schneider@....com>,
        LKML <linux-kernel@...r.kernel.org>, <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <patrick.bellasi@....com>, <tglx@...utronix.de>
CC:     Kefeng Wang <wangkefeng.wang@...wei.com>,
        "zhangyi (F)" <yi.zhang@...wei.com>
Subject: Re: [QUESTION] Hung task warning while running syzkaller test

PS: Using strace to track system calls (such as nice,setpriority,sched_setscheduler,sched_setparam,sched_setaffinity, etc.) that may change the task priority during execution, we did not find that the task priority of syzkaller had been changed.


在 2019/10/26 9:48, Zhihao Cheng 写道:
> In fact, I have some preliminary analysis at present. Someone task hold locks for a long time, and other tasks cannot obtain locks, resulting in hung tasks. The lock holding task for a long time finally releases the lock. The lock holding time is long only because the scheduling is not available.
>
> 1. I added time stamp record to mutex lock holding release lock to get some task printing with lock holding time more than 1s:
> ---------- Mutex held timeout ----------
>     [  110.100545] Lock ffff880422b7ad88 held by syz-executor.14[6823] for more than 1 seconds, it has 65 waiters.  // which means task 6823 holds mutex lock for more than 1s
>     [  110.100548] Task 6823 nr_voluntary_switches = 0, nr_involuntary_switches = 2, sum_exec_runtime = 1189949, vruntime = 20793616, state = 0, cpu = 3, runq = 85, cfs_runq = 24, cfs_h_runq = 85, rt_runq = 0, dl_runq = 0.
>     [  110.100550]  ffff8804157f7dd8 0000000000000246 ffffffff823a68c0 ffff88043fcd7d80
>     [  110.100554]  ffff88041867bd80 ffff8804150a8000 ffff8804157f8000 ffff88041e84a620
>     [  110.100557]  ffff880422a0aa20 0000000000000002 0000000000000000 ffff8804157f7df0
>     [  110.100561] Call Trace:
>     [  110.100564]  [<ffffffff81b0a05f>] preempt_schedule_common+0x1f/0x40
>     [  110.100566]  [<ffffffff81b0a09c>] _cond_resched+0x1c/0x30
>     [  110.100569]  [<ffffffff81302811>] dput+0x41/0x4b0
>     [  110.100572]  [<ffffffff812effed>] lookup_real+0x3d/0x60
>     [  110.100575]  [<ffffffff812f0c4d>] __lookup_hash+0x4d/0x90
>     [  110.100579]  [<ffffffff812f7a92>] filename_create+0x92/0x170
>     [  110.100582]  [<ffffffff812f89d1>] SyS_mkdir+0x51/0xe0
>     [  110.100585]  [<ffffffff81b11521>] entry_SYSCALL_64_fastpath+0x1e/0x9a
>     [  110.100587] ---------- ------------------ ----------
>     [  110.100589] [INFO] Waste 0 ms to handle mutex_lock_timeout event, current lock holders = 4
>                 ---------- Mutex held timeout ----------
>     [  111.102488] Lock ffff880422b7ad88 held by syz-executor.14[6823] for more than 2 seconds, it has 70 waiters.
>     [  111.102491] Task 6823 nr_voluntary_switches = 0, nr_involuntary_switches = 2, sum_exec_runtime = 1189949, vruntime= 20793616, state = 0, cpu = 3, runq = 96, cfs_runq = 26, cfs_h_runq = 96, rt_runq = 0, dl_runq = 0.
>     [  111.102495]  ffff8804157f7dd8 0000000000000246 ffffffff823a68c0 ffff88043fcd7d80
>     [  111.102500]  ffff88041867bd80 ffff8804150a8000 ffff8804157f8000 ffff88041e84a620
>     [  111.102503]  ffff880422a0aa20 0000000000000002 0000000000000000 ffff8804157f7df0
>     [  111.102507] Call Trace:
>     [  111.102518]  [<ffffffff81b0a05f>] preempt_schedule_common+0x1f/0x40
>     [  111.102521]  [<ffffffff81b0a09c>] _cond_resched+0x1c/0x30
>     [  111.102528]  [<ffffffff81302811>] dput+0x41/0x4b0
>     [  111.102532]  [<ffffffff812effed>] lookup_real+0x3d/0x60
>     [  111.102535]  [<ffffffff812f0c4d>] __lookup_hash+0x4d/0x90
>     [  111.102539]  [<ffffffff812f7a92>] filename_create+0x92/0x170
>     [  111.102542]  [<ffffffff812f89d1>] SyS_mkdir+0x51/0xe0
>     [  111.102546]  [<ffffffff81b11521>] entry_SYSCALL_64_fastpath+0x1e/0x9a
>     [  111.102548] ---------- ------------------ ----------
>     [  111.102550] [INFO] Waste 0 ms to handle mutex_lock_timeout event, current lock holders = 3
>                 ---------- Mutex held timeout ----------
>     [  112.104602] Lock ffff880422b7ad88 held by syz-executor.14[6823] for more than 3 seconds, it has 71 waiters.
>     [  112.104606] Task 6823 nr_voluntary_switches = 0, nr_involuntary_switches = 2, sum_exec_runtime = 1189949, vruntime = 20793616, state = 0, cpu = 3, runq = 106, cfs_runq = 28, cfs_h_runq = 106, rt_runq = 0, dl_runq = 0.
>
> !!! Notice here !!!
>     /* We found that task 6823 was not scheduled during the lock holding period and has always belonged to CPU 3.
>
>      * nr_voluntary_switches: mutex_owner_task->nvcsw
>
>      * nr_involuntary_switches: mutex_owner_task->nivcsw
>
>      * sum_exec_runtime: mutex_owner_task->se.sum_exec_runtime
>
>      * vruntime: mutex_owner_task->se.vruntime
>
>      * state = 0: mutex_owner_task->state, TASK_RUNNING
>
>      * cpu: task_cpu(mutex_owner_task)
>
>      * runq: task_rq(mutex_owner_task)->nr_running
>
>      * cfs_runq: task_rq(mutex_owner_task)->cfs.nr_running
>
>      * cfs_h_runq:  task_rq(mutex_owner_task)->cfs.h_nr_running
>
>      * rt_runq: task_rq(mutex_owner_task)->rt.rt_nr_running
>
>      * dl_runq: task_rq(mutex_owner_task)->dl.dl_nr_running
>
>     */
>    
>     [  112.104608]  ffff8804157f7dd8 0000000000000246 ffffffff823a68c0 ffff88043fcd7d80
>     [  112.104611]  ffff88041867bd80 ffff8804150a8000 ffff8804157f8000 ffff88041e84a620
>     [  112.104615]  ffff880422a0aa20 0000000000000002 0000000000000000 ffff8804157f7df0
>     [  112.104618] Call Trace:
>     [  112.104621]  [<ffffffff81b0a05f>] preempt_schedule_common+0x1f/0x40
>     [  112.104624]  [<ffffffff81b0a09c>] _cond_resched+0x1c/0x30
>     [  112.104627]  [<ffffffff81302811>] dput+0x41/0x4b0
>     [  112.104630]  [<ffffffff812effed>] lookup_real+0x3d/0x60
>     [  112.104633]  [<ffffffff812f0c4d>] __lookup_hash+0x4d/0x90
>     [  112.104636]  [<ffffffff812f7a92>] filename_create+0x92/0x170
>     [  112.104639]  [<ffffffff812f89d1>] SyS_mkdir+0x51/0xe0
>     [  112.104642]  [<ffffffff81b11521>] entry_SYSCALL_64_fastpath+0x1e/0x9a
>     [  112.104644] ---------- ------------------ ----------
>     [  112.104647] [INFO] Waste 0 ms to handle mutex_lock_timeout event, current lock holders = 7
>
>
> 2. By tracking the scheduling trajectory of cpu3 with trace, we find that CPU 003 is just scheduling tasks continuously, and the priority of these tasks is the same. Many tasks have been scheduled for many times (30-40 times), but lock holding task is not scheduled. See detail in trace_003.log.
>
> 3. You can convert the repro file into a C program by 'syzprog' tool(see syzprog.c). Using compiled syzprog.c directly for testing did not show hung task, which confused me.
>
> ---
>
> New attachments:
> A. 5.3-rc6-hung_task.txt: hung task verbose in 5.3-rc6 (witch lockdep)
> B. trace_003.log: scheduling trajectory of cpu3
> C. syzprog.c: convert the repro file into a C program
>
> 在 2019/10/25 23:29, Valentin Schneider 写道:
>> Picked the wrong outgoing smtp setting, sorry about that lovely disclaimer...
>>
>> On 25/10/2019 13:50, Zhihao Cheng wrote:
>>> I added config in attachment.
>>>
>>> It will take 5-10 minutes to reproduce on the kernel of the lower version(for example, v4.4). And ftrace may need to be enabled for the latest mainline to reproduce hung_task, it will take several hours.
>>>
>> That sounds fun. I'll try to get that running overnight someday I'm not
>> running other stuff, though TBH seeing as the freezer is involved I wonder if
>> it isn't just syzkaller keeping stuff frozen for too long.
>>
>> .

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ