[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <516FED09.1040008@cn.fujitsu.com>
Date: Thu, 18 Apr 2013 20:54:33 +0800
From: Wanlong Gao <gaowanlong@...fujitsu.com>
To: Jens Axboe <axboe@...nel.dk>
CC: Steven Rostedt <rostedt@...dmis.org>, Tejun Heo <tj@...nel.org>,
namhyung@...il.com, agk@...hat.com, dm-devel@...hat.com,
neilb@...e.de, LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Wanlong Gao <gaowanlong@...fujitsu.com>
Subject: Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On 04/18/2013 08:37 PM, Jens Axboe wrote:
> On Wed, Apr 17 2013, Steven Rostedt wrote:
>> On Wed, 2013-04-17 at 16:36 +0800, Wanlong Gao wrote:
>>> Hi Tj, all,
>>>
>>> I can get kernel panic on my machine with kernel 3.9.0-rc7-4-gbb33db7 every time
>>> after booting for several minutes.
>>>
>>> Here attached the panic picture and the config.
>>> I'm sorry that the panic log is not completed.
>>>
>>> But fortunately, the panic was gone when I reverted the below commit from Tj,
>>>
>>> commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f
>>> Author: Tejun Heo <tj@...nel.org>
>>> Date: Fri Jan 11 13:06:33 2013 -0800
>>>
>>> block: add missing block_bio_complete() tracepoint
>>>
>>> I think this will be helpful for you to resolve this bug, and this may be urgent,
>>> because it's already rc-7 now.
>>
>>
>> Doing a objdump on the code I found this:
>>
>> ffffffff8123d479: 8b 77 48 mov 0x48(%rdi),%esi
>> ffffffff8123d47c: 48 8d 7d c8 lea -0x38(%rbp),%rdi
>> ffffffff8123d480: 48 8b 4d b8 mov -0x48(%rbp),%rcx
>> ffffffff8123d484: ba 28 00 00 00 mov $0x28,%edx
>> ffffffff8123d489: 45 89 e8 mov %r13d,%r8d
>> ffffffff8123d48c: e8 1f d3 ea ff callq ffffffff810ea7b0 <trace_current_buffer_lock_reserve>
>> ffffffff8123d491: 48 85 c0 test %rax,%rax
>> ffffffff8123d494: 49 89 c6 mov %rax,%r14
>> ffffffff8123d497: 74 5a je ffffffff8123d4f3 <ftrace_raw_event_block_bio_complete+0xb3>
>> ffffffff8123d499: 48 89 c7 mov %rax,%rdi
>> ffffffff8123d49c: e8 5f 4f ea ff callq ffffffff810e2400 <ring_buffer_event_data>
>> ffffffff8123d4a1: 48 8b 4b 10 mov 0x10(%rbx),%rcx
>> ffffffff8123d4a5: 31 d2 xor %edx,%edx
>> ffffffff8123d4a7: 48 85 c9 test %rcx,%rcx
>> ffffffff8123d4aa: 74 02 je ffffffff8123d4ae <ftrace_raw_event_block_bio_complete+0x6e>
>>
>> ffffffff8123d4ac: 8b 11 mov (%rcx),%edx <<<---- BUG IS HERE
>>
>> ffffffff8123d4ae: 89 50 08 mov %edx,0x8(%rax)
>> ffffffff8123d4b1: 48 8b 13 mov (%rbx),%rdx
>> ffffffff8123d4b4: 48 8d 78 20 lea 0x20(%rax),%rdi
>> ffffffff8123d4b8: 48 89 50 10 mov %rdx,0x10(%rax)
>> ffffffff8123d4bc: 8b 53 30 mov 0x30(%rbx),%edx
>> ffffffff8123d4bf: 44 89 78 1c mov %r15d,0x1c(%rax)
>> ffffffff8123d4c3: c1 ea 09 shr $0x9,%edx
>> ffffffff8123d4c6: 89 50 18 mov %edx,0x18(%rax)
>> ffffffff8123d4c9: 8b 53 30 mov 0x30(%rbx),%edx
>> ffffffff8123d4cc: 48 8b 73 20 mov 0x20(%rbx),%rsi
>>
>> According to the picture, that showed where in the code dump, it has the
>> command that died: "<8b> 11 89 50 08".
>>
>> Just before that, rcx was tested for zero, and passed (wasn't zero). But
>> it seems that it wasn't a real address space either.
>>
>> bio is allocated with kmalloc and not kzalloc, so it is possible that if
>> bio->bi_bdev never got initialized and is just garbage.
>
> A bio is always fully initialized, regardless of which internal
> allocator it came from. If people are doing private kmallocs, then they
> better be using bio_init() as well.
>
> Wanlong, would it be possible to get a full dmesg on boot see I can see
> what drivers and file systems are in use? Anything special about your
> setup.
Sure, attached.
Thanks,
Wanlong Gao
>
> Will get this reverted for 3.9 final, so we have some time to
> investigate.
>
View attachment "full_dmesg" of type "text/plain" (246782 bytes)
Powered by blists - more mailing lists