lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56584916.3080606@unitn.it>
Date:	Fri, 27 Nov 2015 13:14:14 +0100
From:	Luca Abeni <luca.abeni@...tn.it>
To:	Juri Lelli <juri.lelli@....com>,
	Wanpeng li <wanpeng.li@...mail.com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched/deadline: fix earliest_dl.next logic

Hi all,

I ran some quick tests on this patch (because I was working on something
related), and it seems to me that it triggers a bug. Here are some information:
- I applied the patch to git master (to be sure that I did not corrupt the patch,
   I extracted it again with git format-patch, and posted it here:
   http://disi.unitn.it/~abeni/H-Test/0001-sched-deadline-fix-earliest_dl.next-logic.patch )
- I tried a simple test program creating 4 periodic threads and scheduling them with
   SCHED_DEADLINE. And I got something like:
[   29.055688] ------------[ cut here ]------------
[   29.056563] kernel BUG at /home/luca/Src/Kernel/linux/kernel/sched/deadline.c:1443!
[   29.056563] invalid opcode: 0000 [#1] SMP
[   29.056563] Modules linked in:
[   29.056563] CPU: 0 PID: 1136 Comm: periodic_thread Not tainted 4.4.0-rc2+ #3
[   29.056563] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
[   29.056563] task: ffff880006603d80 ti: ffff8800067f4000 task.ti: ffff8800067f4000
[   29.056563] RIP: 0010:[<ffffffff8109704a>]  [<ffffffff8109704a>] pick_next_pushable_dl_task+0x5a/0x60
[   29.056563] RSP: 0000:ffff8800067f7e38  EFLAGS: 00010046
[   29.056563] RAX: ffff880006603d80 RBX: ffff880006a7ac50 RCX: ffff8800067f4000
[   29.056563] RDX: ffff8800066040d0 RSI: 0000000000000000 RDI: ffff880007a15ac0
[   29.056563] RBP: ffff8800067f7e38 R08: 0000000000000000 R09: 0000000000000000
[   29.056563] R10: ffff880006603d80 R11: ffff880006604548 R12: ffff880007a15ac0
[   29.056563] R13: ffff880006a7a900 R14: ffff8800066042e0 R15: ffffffff81a09cc0
[   29.056563] FS:  00007f667e179700(0000) GS:ffff880007a00000(0000) knlGS:0000000000000000
[   29.056563] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   29.056563] CR2: 00007f71f3c221b8 CR3: 00000000067ae000 CR4: 00000000000006f0
[   29.056563] Stack:
[   29.056563]  ffff8800067f7e60 ffffffff81097137 ffff880007a15ac0 ffff880006a7ab20
[   29.056563]  ffff880006a7a900 ffff8800067f7e90 ffffffff81098ef1 ffff880006603d80
[   29.056563]  ffff880007a15ac0 ffff880007a15ad8 0000000000000000 ffff8800067f7ee8
[   29.056563] Call Trace:
[   29.056563]  [<ffffffff81097137>] dequeue_pushable_dl_task+0x47/0x90
[   29.056563]  [<ffffffff81098ef1>] pick_next_task_dl+0x81/0x200
[   29.056563]  [<ffffffff819521ad>] __schedule+0xb3d/0xd00
[   29.056563]  [<ffffffff819523a7>] schedule+0x37/0x80
[   29.056563]  [<ffffffff81050d46>] exit_to_usermode_loop+0x39/0x7f
[   29.056563]  [<ffffffff81001a76>] prepare_exit_to_usermode+0x46/0x50
[   29.056563]  [<ffffffff819586a5>] retint_user+0x8/0x20
[   29.056563] Code: 3b 87 18 09 00 00 74 23 83 7a 98 01 7e 1b 83 ba f4 fc ff ff 01 75 10 8b 92 f8 fc ff ff 85 d2 78 04 0f 0b 31 c0 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 90 55 48 89 
e5 41 54 49 89 f4 53 48 8b b7 18 09
[   29.056563] RIP  [<ffffffff8109704a>] pick_next_pushable_dl_task+0x5a/0x60
[   29.056563]  RSP <ffff8800067f7e38>
[   29.056563] ---[ end trace 9f49218ec54a0b55 ]---
- How to reproduce:
   + Download http://disi.unitn.it/~abeni/H-Test/newcore-3.gz
   + Start KVM with:
	qemu-system-x86_64 -initrd newcore-3.gz -machine pc-q35-2.0,accel=kvm -smp cpus=2 -kernel bzImage -serial file:qlog.txt -append "console=ttyS0"
   + Inside the VM, do
	cd /
	sudo ./periodic_thread
   + You should see a crash in few seconds
   The "periodic_thread" program just creates 4 periodic threads (with execution time about
   10ms and period about 100ms) scheduled with SCHED_DEADLINE (with runtime 15ms and period
   100ms). I can provide the source code if needed.
- The full kernel log is available here: http://disi.unitn.it/~abeni/H-Test/qlog.txt


Here is my understanding of the crash:
- schedule() invokes pick_next_task_dl() which wants to do a context switch (by selecting
   for execution a new task "p" which is different from "prev")
- pick_next_task_dl() invokes put_prev, which puts the "prev" task in the pushable tasks
   queue (WARNING! "prev" is still the "current" task in this rq, because the scheduler is
   still running... I think this is the source of the issue)
- then, pick_next_task_dl() invokes dequeue_pushable_dl_task() on p, to remove the selected
   task from the pushable tasks queue...
- ...But after your patch dequeue_pushable_dl_task() invokes pick_next_pushable_dl_task().
   Which sees that the next pushable task is the "current" task (see above). This happens
   becuase "prev" has already been inserted in the pushable tasks queue, and can be the
   next pushable task... But "current" has not been updated yet.
- The BUG_ON() at line 1443 of deadline.c is just "BUG_ON(task_current(rq, p))"

Summing up, I think pick_next_pushable_dl_task() cannot be called from
dequeue_pushable_dl_task() (at least, not without removing or modifying that BUG_ON()).


On 11/27/2015 12:26 PM, Juri Lelli wrote:
[...]
>> +
>> +	if (dl_time_before(p->dl.deadline, dl_rq->earliest_dl.next))
>> +		dl_rq->earliest_dl.next = p->dl.deadline;
>
> This seems to be a bug, as earliest_dl.next is initialized to 0 and
> dl_time_before() will say that p has later deadline than
> earliest_dl.next even if p is actually the first pushable task.
I also think the "if (dl_time_before(p->dl.deadline, dl_rq->earliest_dl.next))"
might be wrong, but I have no testcases...

I _suspect_ (but I might be wrong) that here you could use
	if (leftmost)
		dl_rq->earliest_dl.next = p->dl.deadline;



				Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ