[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161026225421.GA15247@clm-mbp.thefacebook.com>
Date: Wed, 26 Oct 2016 18:54:21 -0400
From: Chris Mason <clm@...com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Dave Jones <davej@...emonkey.org.uk>,
Andy Lutomirski <luto@...capital.net>,
Andy Lutomirski <luto@...nel.org>, Jens Axboe <axboe@...com>,
Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.
On Wed, Oct 26, 2016 at 03:07:10PM -0700, Linus Torvalds wrote:
>On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason <clm@...com> wrote:
>>
>> Today I turned off every CONFIG_DEBUG_* except for list debugging, and
>> ran dbench 2048:
>>
>> [ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33 __list_add+0xbe/0xd0
>> [ 2759.119652] list_add corruption. prev->next should be next (ffffe8ffffc80308), but was ffffc90000ccfb88. (prev=ffff880128522380).
>> [ 2759.121039] Modules linked in: crc32c_intel i2c_piix4 aesni_intel aes_x86_64 virtio_net glue_helper i2c_core lrw floppy gf128mul serio_raw pcspkr button ablk_helper cryptd sch_fq_codel autofs4 virtio_blk
>> [ 2759.124369] CPU: 2 PID: 31039 Comm: dbench Not tainted 4.9.0-rc1-15246-g4ce9206-dirty #317
>> [ 2759.125077] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
>> [ 2759.125077] ffffc9000f6fb868 ffffffff814fe4ff ffffffff8151cb5e ffffc9000f6fb8c8
>> [ 2759.125077] ffffc9000f6fb8c8 0000000000000000 ffffc9000f6fb8b8 ffffffff81064bbf
>> [ 2759.127444] ffff880128523680 0000002139968000 ffff880138b7a4a0 ffff880128523540
>> [ 2759.127444] Call Trace:
>> [ 2759.127444] [<ffffffff814fe4ff>] dump_stack+0x53/0x74
>> [ 2759.127444] [<ffffffff8151cb5e>] ? __list_add+0xbe/0xd0
>> [ 2759.127444] [<ffffffff81064bbf>] __warn+0xff/0x120
>> [ 2759.127444] [<ffffffff81064c99>] warn_slowpath_fmt+0x49/0x50
>> [ 2759.127444] [<ffffffff8151cb5e>] __list_add+0xbe/0xd0
>> [ 2759.127444] [<ffffffff814df338>] blk_sq_make_request+0x388/0x580
>
>Ok, that's definitely the same one that Dave started out seeing.
>
>The fact that it is that reliable - two different machines, two very
>different loads (dbench looks nothing like trinity) really makes me
>think that maybe the problem really is in the block plugging after
>all.
>
>It very much does not smell like random stack corruption. It's simply
>not random enough.
Agreed. I'd feel better if I could trigger this outside of btrfs, even
though Dave Chinner hit something very similar on xfs. I'll peel off
another test machine for a long XFS run.
>
>And I just noticed something: I originally thought that this is the
>"list_add_tail()" to the plug - which is the only "list_add()" variant
>in that function.
>
>But that never made sense, because the whole "but was" isn't a stack
>address, and "next" in "list_add_tail()" is basically fixed, and would
>have to be the stack.
>
>But I now notice that there's actually another "list_add()" variant
>there, and it's the one from __blk_mq_insert_request() that gets
>inlined into blk_mq_insert_request(), which then gets inlined into
>blk_mq_make_request().
>
>And that actually makes some sense just looking at the offsets too:
>
> blk_sq_make_request+0x388/0x580
>
>so it's somewhat at the end of blk_sq_make_request(). So it's not unlikely.
>
>And there it makes perfect sense that the "next should be" value is
>*not* on the stack.
>
>Chris, if you built with debug info, you can try
>
> ./scripts/faddr2line /boot/vmlinux blk_sq_make_request+0x388
>
>to get what line that blk_sq_make_request+0x388 address actually is. I
>think it's the
>
> list_add_tail(&rq->queuelist, &ctx->rq_list);
>
>in __blk_mq_insert_req_list() (when it's inlined from
>blk_sq_make_request(), "at_head" will be false.
>
>So it smells like "&ctx->rq_list" might be corrupt.
I'm running your current git here, so these line numbers should line up
for you:
blk_sq_make_request+0x388/0x578:
__blk_mq_insert_request at block/blk-mq.c:1049
(inlined by) blk_mq_merge_queue_io at block/blk-mq.c:1175
(inlined by) blk_sq_make_request at block/blk-mq.c:1419
The fsync path in the WARN doesn't have any plugs that I can see, so its
not surprising that we're not in the plugging path. I'm here:
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
/*
* For a SYNC request, send it to the hardware immediately. For
* an ASYNC request, just ensure that we run it later on. The
* latter allows for merging opportunities and more efficient
* dispatching.
*/
I'll try the debugging patch you sent in the other email.
-chris
Powered by blists - more mailing lists