[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120713195615.GC1707@redhat.com>
Date: Fri, 13 Jul 2012 15:56:15 -0400
From: Dave Jones <davej@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Linux Kernel <linux-kernel@...r.kernel.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Darren Hart <darren@...art.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: 3.5-rc6 futex_wait_requeue_pi oops.
On Fri, Jul 13, 2012 at 09:11:57PM +0200, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Dave Jones wrote:
>
> > On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
> > > On Fri, 13 Jul 2012, Dave Jones wrote:
> > >
> > > > Looks like calling futex() with garbage makes things unhappy.
> > >
> > > WARN_ON(!&q.pi_state);
> > > pi_mutex = &q.pi_state->pi_mutex;
> > > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> > > debug_rt_mutex_free_waiter(&rt_waiter);
> > >
> > > So there is some weird way which causes q.pi_state = NULL. Dave, did
> > > you see the warning before the oops happened ?
> >
> > No, that didn't seem to trigger.
>
> Yuck. The rt_mutex is embedded in pi_state and not a pointer and the
> thing explodes in __lock_acquire if the raw lock protecting the
> rtmutex internals.
>
> Can you decode the exact code line ?
Hmm. I think I rebuilt the kernel, so things may be slightly different, though
what I see surprises me..
decoding the Code: line shows..
Code: d8 45 0f 45 e0 4c 89 75 f0 4c 89 7d f8 85 c0 0f 84 f8 00 00 00 8b 05 e2 af fa 00 49 89 ff 89 f3 41 89 d2 85 c0 0f 84 02 01 00 00 <49> 8b 07 ba 01 00 00 00 48 3d 20 c4 0c 82 44 0f 44 e2 83 fb 01
0000000000000000 <.text>:
0: d8 45 0f fadds 0xf(%rbp)
3: 45 e0 4c rex.RB loopne 0x52
6: 89 75 f0 mov %esi,-0x10(%rbp)
9: 4c 89 7d f8 mov %r15,-0x8(%rbp)
d: 85 c0 test %eax,%eax
f: 0f 84 f8 00 00 00 je 0x10d
15: 8b 05 e2 af fa 00 mov 0xfaafe2(%rip),%eax # 0xfaaffd
1b: 49 89 ff mov %rdi,%r15
1e: 89 f3 mov %esi,%ebx
20: 41 89 d2 mov %edx,%r10d
23: 85 c0 test %eax,%eax
25: 0f 84 02 01 00 00 je 0x12d
/home/davej/tmp/tmp.SI8vbYzuK6.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 49 8b 07 mov (%r15),%rax
3: ba 01 00 00 00 mov $0x1,%edx
8: 48 3d 20 c4 0c 82 cmp $0xffffffff820cc420,%rax
e: 44 0f 44 e2 cmove %edx,%r12d
12: 83 fb 01 cmp $0x1,%ebx
The only instance of 49 8b 07 followed by ba 01 in kernel/lockdep.o is this ..
/*
* Lockdep should run with IRQs disabled, otherwise we could
* get an interrupt which would want to take locks, which would
* end up in lockdep and have you got a head-ache already?
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
3f88: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 3f8e <__lock_acquire+0x4e>
3f8e: 49 89 ff mov %rdi,%r15
3f91: 89 f3 mov %esi,%ebx
3f93: 41 89 d2 mov %edx,%r10d
3f96: 85 c0 test %eax,%eax
3f98: 0f 84 02 01 00 00 je 40a0 <__lock_acquire+0x160>
return 0;
if (lock->key == &__lockdep_no_validate__)
3f9e: 49 8b 07 mov (%r15),%rax <<<<<<<<<<<<<<<<<<
check = 1;
3fa1: ba 01 00 00 00 mov $0x1,%edx
Seems to add up. Though the bytes in the code: line following don't match what's in the object..
3fa6: 48 3d 00 00 00 00 cmp $0x0,%rax
3fac: 44 0f 44 e2 cmove %edx,%r12d
That line at 3fa6 got changed from an actual address to a NULL.
I guess that's the &__lockdep_no_validate__ comparison.
Though it seems odd that the kernel text would change.
Does lockdep do that when it gets disabled or something ?
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists