linux-kernel - Re: [PATCH 2/2] RT: remove "paranoid" limit in push_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48E62253.1090000@bull.net>
Date:	Fri, 03 Oct 2008 15:46:59 +0200
From:	Gilles Carry <Gilles.Carry@...l.net>
To:	Gregory Haskins <ghaskins@...ell.com>
Cc:	Chirag Jog <chirag@...ux.vnet.ibm.com>,
	linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	rostedt@...dmis.org, dvhltc@...ibm.com, dino@...ibm.com
Subject: Re: [PATCH 2/2] RT: remove "paranoid" limit in push_rt_task

Sorry Greg,

Neither PPC64 nor Intel64 make it with this patch.
At boot time, it stops at the BUG_ON you added:
0xc00000000004eca4 is in push_rt_task (kernel/sched_rt.c:1102)

I let you do more investigations.
Have a good week-end in you garage ;)

Gilles.


PPC64:
cpu 0x2: Vector: 700 (Program Check) at [c0000000ee2877b0]
     pc: c00000000004eca4: .push_rt_task+0x1f4/0x2d0
     lr: c00000000004ec24: .push_rt_task+0x174/0x2d0
     sp: c0000000ee287a30
    msr: 8000000000021032
   current = 0xc0000000ee276fe0
   paca    = 0xc0000000005c3780
     pid   = 36, comm = sirq-block/2
kernel BUG at kernel/sched_rt.c:1102!
enter ? for help
[c0000000ee287a30] c00000000004ec78 .push_rt_task+0x1c8/0x2d0 (unreliable)
[c0000000ee287ae0] c00000000004eda4 .push_rt_tasks+0x24/0x44
[c0000000ee287b70] c00000000004edf0 .post_schedule_rt+0x2c/0x50
[c0000000ee287c00] c000000000052864 .finish_task_switch+0x100/0x1a8
[c0000000ee287cb0] c0000000002cdbd0 .__schedule+0x6a0/0x75c
[c0000000ee287d90] c0000000002cdedc .schedule+0xf4/0x128
[c0000000ee287e20] c000000000061700 .ksoftirqd+0x124/0x37c
[c0000000ee287f00] c000000000076dc0 .kthread+0x84/0xd4
[c0000000ee287f90] c000000000029368 .kernel_thread+0x4c/0x68
2:mon>

Intel64:
kernel BUG at kernel/sched_rt.c:1102!
invalid opcode: 0000 [1] PREEMPT SMP
CPU 4
Modules linked in: mptsas scsi_transport_sas
Pid: 61, comm: sirq-block/4 Not tainted 2.6.26.5-rt9-00002-g3b27927-dirty #26
RIP: 0010:[<ffffffff8022b307>]  [<ffffffff8022b307>] push_rt_task+0x15f/0x20b
RSP: 0018:ffff81007f4d5d70  EFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff81007edf09d0 RCX: 000000000822b765
RDX: 000000000822b765 RSI: 0000000000000000 RDI: ffff81000103f280
RBP: ffff81007f4d5da0 R08: ffff81007f4d4000 R09: ffff81007edcbe20
R10: 00000000ffffffff R11: ffffffff8021fa2c R12: 0000000000000000
R13: ffff810001034280 R14: ffff81007edf09e0 R15: ffff81000103f280
FS:  00007f2f26e776f0(0000) GS:ffff81007fc0ccc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006b9fb0 CR3: 00000001bf4c9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sirq-block/4 (pid: 61, threadinfo ffff81007f4d4000, task
ffff81007f4d0e10)
Stack:  000000007f4d5e00 ffff81000103f280 ffff81007edf09d0 ffff8101bf457540
  0000000000000001 0000000000000002 ffff81007f4d5dc0 ffffffff8022b3c7
  ffff81007f4d5de0 ffff81000103f280 ffff81007f4d5de0 ffffffff8022b3e8
Call Trace:
  [<ffffffff8022b3c7>] push_rt_tasks+0x14/0x1c
  [<ffffffff8022b3e8>] post_schedule_rt+0x19/0x25
  [<ffffffff8022d7ee>] finish_task_switch+0x73/0x121
  [<ffffffff805bbe3d>] thread_return+0x4f/0xdc
  [<ffffffff805bc066>] schedule+0xd4/0xf0
  [<ffffffff80237eeb>] ksoftirqd+0xb3/0x260
  [<ffffffff80237e38>] ? ksoftirqd+0x0/0x260
  [<ffffffff80245209>] ? kthread+0x47/0x76
  [<ffffffff8022e9f9>] ? schedule_tail+0x43/0x97
  [<ffffffff8020c3d8>] ? child_rip+0xa/0x12
  [<ffffffff802451c2>] ? kthread+0x0/0x76
  [<ffffffff8020c3ce>] ? child_rip+0x0/0x12


Code: 48 c7 c6 c0 1d 23 80 e8 83 b3 03 00 e9 ee fe ff ff 4c 89 e7 e8 b1 31 39
00 eb ba 48 8b 43 08 8b 40 18 41 3b 87 90 0e 00 00 74 04 <0f> 0b eb fe 48 89
de 4c 89 ff e8 5b fe ff ff f0 41 ff 0e 0f 94
RIP  [<ffffffff8022b307>] push_rt_task+0x15f/0x20b
  RSP <ffff81007f4d5d70>


Gregory Haskins wrote:
> A panic was discovered by Chirag Jog and investigated by Gilles Carry
> to be originating in the fact that a task being pushed away
> may get migrated away during a double_lock_balance.  The result was
> that the pushable_tasks list may become corrupted.
> 
> The root cause is that the "paranoid" retry limit could cause us to
> bail out of a retry, but still try to remove the item from the (now
> potentially incorrect) list.  There are numerous ways to correct the
> condition, but the paranoid feature is no longer relevant with the new
> pushable logic (since pushable naturally limits the loop anyway), so
> lets just remove it.
> 
> Reported By: Chirag Jog <chirag@...ux.vnet.ibm.com>
> Found-by: Gilles Carry <gilles.carry@...l.net>
> Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
> ---
> 
>  kernel/sched_rt.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 59ead84..5a754fe 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -1056,7 +1056,6 @@ static int push_rt_task(struct rq *rq)
>  {
>  	struct task_struct *next_task;
>  	struct rq *lowest_rq;
> -	int paranoid = RT_MAX_TRIES;
>  
>  	if (!rq->rt.overloaded)
>  		return 0;
> @@ -1094,12 +1093,14 @@ static int push_rt_task(struct rq *rq)
>  		 * If it has, then try again.
>  		 */
>  		task = pick_next_pushable_task(rq);
> -		if (unlikely(task != next_task) && task && paranoid--) {
> +		if (unlikely(task != next_task) && task) {
>  			put_task_struct(next_task);
>  			next_task = task;
>  			goto retry;
>  		}
>  
> +		BUG_ON(task_cpu(next_task) != rq->cpu);
> +
>  		/*
>  		 * Once we have failed to push this task, we will not
>  		 * try again, since the other cpus will pull from us
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/