[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrX9+eS3KYdN2UBvmBsMhKr3f-PBSC8UHqrnK8WhE6_guw@mail.gmail.com>
Date: Mon, 5 Dec 2016 10:11:23 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Vegard Nossum <vegard.nossum@...il.com>,
Borislav Petkov <bp@...en8.de>
Cc: Dave Jones <davej@...emonkey.org.uk>, Chris Mason <clm@...com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jens Axboe <axboe@...com>, Andy Lutomirski <luto@...nel.org>,
Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.
On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
> On 23 November 2016 at 20:58, Dave Jones <davej@...emonkey.org.uk> wrote:
>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>
>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
>> > trace from just before this happened. Does this shed any light ?
>> >
>> > https://codemonkey.org.uk/junk/trace.txt
>>
>> crap, I just noticed the timestamps in the trace come from quite a bit
>> later. I'll tweak the code to do the taint checking/ftrace stop after
>> every syscall, that should narrow the window some more.
>
> FWIW I hit this as well:
>
> BUG: unable to handle kernel paging request at ffffffff81ff08b7
We really ought to improve this message. If nothing else, it should
say whether it was a read, a write, or an instruction fetch.
> IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30
> PGD 461e067 PUD 461f063
> PMD 1e001e1
Too lazy to manually decode this right now, but I don't think it matters.
> Oops: 0003 [#1] PREEMPT SMP KASAN
Note this is SMP, but that just means CONFIG_SMP=y. Vegard, how many
CPUs does your kernel think you have?
> RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>]
> __lock_acquire.isra.32+0xda/0x1a30
> RSP: 0018:ffff8801bab8f730 EFLAGS: 00010082
> RAX: ffffffff81ff071f RBX: 0000000000000000 RCX: 0000000000000000
RAX points to kernel text.
> Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
> c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e>
> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
2b: 3e ff 80 98 01 00 00 incl %ds:*0x198(%rax) <--
trapping instruction
That's very strange. I think this is:
atomic_inc((atomic_t *)&class->ops);
but my kernel contains:
3cb4: f0 ff 80 98 01 00 00 lock incl 0x198(%rax)
So your kernel has been smp-alternatived. That 3e comes from
alternatives_smp_unlock. If you're running on SMP with UP
alternatives, things will break.
What's your kernel command line? Can we have your entire kernel log from boot?
Adding Borislav, since he's the guru for this code.
Powered by blists - more mailing lists