linux-kernel - Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 6 Jan 2014 10:21:28 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Muthu Kumar <muthu.lkml@...il.com>
Cc:	Kent Overstreet <kmo@...erainc.com>, Jens Axboe <axboe@...nel.dk>,
	linux-btrfs <linux-btrfs@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>, lkp@...ux.intel.com
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Sun, Jan 05, 2014 at 08:28:57AM -0800, Muthu Kumar wrote:
> Fengguang,
> Instead of rebooting, can you trigger a crash dump when this happens
> and send us the backtrace (to start with)?

Muthu, good point! Attached is the full dmesg with backtrace:

[ 1398.988324] SysRq : Show Blocked State
[ 1398.992007]   task                        PC stack   pid father
[ 1398.992007] mount           D 0000000000000002     0  2875   2870 0x00000000
[ 1398.992007]  ffff88007f859a70 0000000000000082 ffff88007f859fd8 ffff8803d21c6c10
[ 1398.992007]  0000000000012fc0 ffff8803d21c6c10 0000000000000000 0000000000000000
[ 1398.992007]  ffff8803d2d22068 0000000000000008 ffff88007f859a18 ffffffff814c2b62
[ 1398.992007] Call Trace:
[ 1398.992007]  [<ffffffff814c2b62>] ? submit_bio+0x106/0x159
[ 1398.992007]  [<ffffffff81431c6a>] ? __do_readpage+0x4b9/0x50e
[ 1398.992007]  [<ffffffff81064a03>] ? kvm_clock_read+0x27/0x31
[ 1398.992007]  [<ffffffff81064a16>] ? kvm_clock_get_cycles+0x9/0xb
[ 1398.992007]  [<ffffffff811651a1>] ? filemap_fdatawait+0x23/0x23
[ 1398.992007]  [<ffffffff819ff356>] schedule+0x6f/0x71
[ 1398.992007]  [<ffffffff819ff59b>] io_schedule+0x8f/0xd6
[ 1398.992007]  [<ffffffff811651af>] sleep_on_page+0xe/0x12
[ 1398.992007]  [<ffffffff819ff861>] __wait_on_bit+0x48/0x7b
[ 1398.992007]  [<ffffffff81165002>] wait_on_page_bit+0x7a/0x7c
[ 1398.992007]  [<ffffffff810f7ee3>] ? autoremove_wake_function+0x34/0x34
[ 1398.992007]  [<ffffffff81433eee>] read_extent_buffer_pages+0x1ae/0x23b
[ 1398.992007]  [<ffffffff81410da7>] ? free_root_pointers+0x5b/0x5b
[ 1398.992007]  [<ffffffff814123e5>] btree_read_extent_buffer_pages.constprop.48+0x66/0x100
[ 1398.992007]  [<ffffffff814129d1>] read_tree_block+0x2f/0x47
[ 1398.992007]  [<ffffffff814163e6>] open_ctree+0x1271/0x1adf
[ 1398.992007]  [<ffffffff813f4243>] btrfs_mount+0x47b/0x771
[ 1398.992007]  [<ffffffff814e1f8c>] ? get_from_free_list+0x41/0x4b
[ 1398.992007]  [<ffffffff811c40bf>] mount_fs+0x15/0xae
[ 1398.992007]  [<ffffffff811d9a52>] vfs_kern_mount+0x64/0xf6
[ 1398.992007]  [<ffffffff811dbff6>] do_mount+0x781/0x878
[ 1398.992007]  [<ffffffff8117d6c2>] ? strndup_user+0x3a/0xd6
[ 1398.992007]  [<ffffffff811dc317>] SyS_mount+0x85/0xbe
[ 1398.992007]  [<ffffffff81a09529>] system_call_fastpath+0x16/0x1b
[ 1398.992007] Sched Debug Version: v0.11, 3.13.0-rc6-00148-gc05f7ce #1

> Kent,
> Did you do any btrfs test with your changes?

Just try simple dd writes.

Thanks,
Fengguang


> Regards,
> Muthu
> 
> On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
> > Hi Muthu,
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: fengguang.wu@...el.com
> >> Cc: Kent Overstreet <kmo@...erainc.com>
> >> CC: Jens Axboe <axboe@...nel.dk>
> >> Signed-off-by: Muthukumar Ratty <muthur@...il.com>
> >> --------
> >>
> >>  fs/btrfs/volumes.c |    6 +++++-
> >>  1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >>                 }
> >>                 kfree(bbio);
> >>
> >> -               bio_endio(bio, err);
> >> +                /*
> >> +                 * Call endio_nodec on the restored bio so the bi_remaining is
> >> +                 * accounted for correctly
> >> +                 */
> >> +               bio_endio_nodec(bio, err);
> >>         } else if (!is_orig_bio) {
> >>                 bio_put(bio);
> >>         }
> >
> > Interestingly, the BUG message disappeared but it blocks the test run.
> > In the end, the test watchdog reboots the machine with SysRq:
> >
> >         2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
> >         [   20.184264] btrfs: device fsid f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
> >         [   20.186552] btrfs: disk space caching is enabled
> >         [  131.360457] random: nonblocking pool is initialized
> > ==>     [ 1465.069342] SysRq : Emergency Sync
> > ==>     [ 1475.071055] SysRq : Resetting
> >
> > Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
> > (this patch).
> >
> > Thanks,
> > Fengguang

View attachment "dmesg-bio_endio_nodec-w" of type "text/plain" (95352 bytes)