lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Dec 2016 11:15:21 -0500
From:   Dave Jones <davej@...emonkey.org.uk>
To:     Chris Mason <clm@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jens Axboe <axboe@...com>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
        David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: btrfs_destroy_inode warn (outstanding extents)

On Sat, Dec 03, 2016 at 11:48:33AM -0500, Dave Jones wrote:

 > The interesting process here seems to be kworker/u8:17, and the trace
 > captures some of what that was doing before that bad page was hit.

I'm travelling next week, so I'm trying to braindump the stuff I've
found so far and summarise so I can pick it back up later if no-one else
figures it out first.

I've hit the bad page map spew with enough regularity that I've now got a handful of
good traces.

http://codemonkey.org.uk/junk/btrfs/bad-page-state1.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state2.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state3.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state4.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state5.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state6.txt

It smells to me like a race between truncate and the writeback
workqueue. The variety of traces here seem to show both sides
of the race, sometimes it's kworker, sometimes a trinity child process.

bad-page-state3.txt onwards have some bonus trace_printk's from
btrfs_setsize as I was curious what sizes we were passing down to
truncate. The only patterns I see are going from very large to very
small sizes. Perhaps that causes truncate to generate so much
writeback that it makes the race apparent ?



Other stuff I keep hitting:

Start transaction spew:
http://codemonkey.org.uk/junk/btrfs/start_transaction.txt
That's the WARN_ON(h->use_count > 2);
I hit this with enough regularity that I had to comment it out.
It's not clear to me whether this is related at all.

Lockdep spew:
http://codemonkey.org.uk/junk/btrfs/register_lock_class1.txt
http://codemonkey.org.uk/junk/btrfs/register_lock_class2.txt
This stuff has been around for a while (4.6ish iirc)

Sometimes the fs got into a screwed up state that needed btrfscking.
http://codemonkey.org.uk/junk/btrfs/replay-log-fail.txt

	Dave

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ