linux-kernel - Re: bio linked list corruption.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161027054133.GM14023@dastard>
Date:   Thu, 27 Oct 2016 16:41:33 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Dave Jones <davej@...emonkey.org.uk>, Chris Mason <clm@...com>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jens Axboe <axboe@...com>, Al Viro <viro@...iv.linux.org.uk>,
        Josef Bacik <jbacik@...com>, David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: bio linked list corruption.

On Tue, Oct 25, 2016 at 08:27:52PM -0400, Dave Jones wrote:
> DaveC: Do these look like real problems, or is this more "looks like
> random memory corruption" ?  It's been a while since I did some stress
> testing on XFS, so these might not be new..
> 
> XFS: Assertion failed: oldlen > newlen, file: fs/xfs/libxfs/xfs_bmap.c, line: 2938
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:113!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 1 PID: 6227 Comm: trinity-c9 Not tainted 4.9.0-rc1-think+ #6 
> task: ffff8804f4658040 task.stack: ffff88050568c000
> RIP: 0010:[<ffffffffa02d3e2b>] 
>   [<ffffffffa02d3e2b>] assfail+0x1b/0x20 [xfs]
> RSP: 0000:ffff88050568f9e8  EFLAGS: 00010282
> RAX: 00000000ffffffea RBX: 0000000000000046 RCX: 0000000000000001
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa02fe34d
> RBP: ffff88050568f9e8 R08: 0000000000000000 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: ffff88050568fb44
> R13: 00000000000000f3 R14: ffff8804f292bf88 R15: 000ffffffffe0046
> FS:  00007fe2ddfdfb40(0000) GS:ffff88050a000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe2dbabd000 CR3: 00000004f461f000 CR4: 00000000001406e0
> Stack:
>  ffff88050568fa88  ffffffffa027ccee  fffffffffffffff9  ffff8804f16fd8b0
>  0000000000003ffa  0000000000000032  ffff8804f292bf40  0000000000004976
>  000ffffffffe0008  00000000000004fd  ffff880400000000  0000000000005107
> Call Trace:
>  [<ffffffffa027ccee>] xfs_bmap_add_extent_hole_delay+0x54e/0x620 [xfs]
>  [<ffffffffa027f2d4>] xfs_bmapi_reserve_delalloc+0x2b4/0x400 [xfs]
>  [<ffffffffa02cadd7>] xfs_file_iomap_begin_delay.isra.12+0x247/0x3c0 [xfs]
>  [<ffffffffa02cb0d1>] xfs_file_iomap_begin+0x181/0x270 [xfs]
>  [<ffffffff8122e573>] iomap_apply+0x53/0x100
>  [<ffffffff8122e68b>] iomap_file_buffered_write+0x6b/0x90

All this iomap code is new, so it's entirely possible that this is a
new bug. The assert failure is indicative that the delalloc extent's
metadata reservation grew when we expected it to stay the same or
shrink.

> XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 309
...
>  [<ffffffffa057abe1>] xfs_trans_mod_sb+0x241/0x280 [xfs]
>  [<ffffffffa0548eff>] xfs_ag_resv_alloc_extent+0x4f/0xc0 [xfs]
>  [<ffffffffa050aa3d>] xfs_alloc_ag_vextent+0x23d/0x300 [xfs]
>  [<ffffffffa050bb1b>] xfs_alloc_vextent+0x5fb/0x6d0 [xfs]
>  [<ffffffffa051c1b4>] xfs_bmap_btalloc+0x304/0x8e0 [xfs]
>  [<ffffffffa054648e>] ? xfs_iext_bno_to_ext+0xee/0x170 [xfs]
>  [<ffffffffa051c8db>] xfs_bmap_alloc+0x2b/0x40 [xfs]
>  [<ffffffffa051dc30>] xfs_bmapi_write+0x640/0x1210 [xfs]
>  [<ffffffffa0569326>] xfs_iomap_write_allocate+0x166/0x350 [xfs]
>  [<ffffffffa05540b0>] xfs_map_blocks+0x1b0/0x260 [xfs]
>  [<ffffffffa0554beb>] xfs_do_writepage+0x23b/0x730 [xfs]

And that's indicative of a delalloc metadata reservation being
being too small and so we're allocating unreserved blocks.

Different symptoms, same underlying cause, I think.

I see the latter assert from time to time in my testing, but it's
not common (maybe once a month) and I've never been able to track it
down.  However, it doesn't affect production systems unless they hit
ENOSPC hard enough that it causes the critical reserve pool to be
exhausted iand so the allocation fails. That's extremely rare -
usually takes a several hundred processes all trying to write as had
as they can concurrently and to all slip through the ENOSPC
detection without the correct metadata reservations and all require
multiple metadata blocks to be allocated durign writeback...

If you've got a way to trigger it quickly and reliably, that would
be helpful...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com