lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131205034143.GU5051@kernel.dk>
Date:	Wed, 4 Dec 2013 20:41:43 -0700
From:	Jens Axboe <axboe@...nel.dk>
To:	Dave Chinner <david@...morbit.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [OOPS, 3.13-rc2] null ptr in dio_complete()

On Thu, Dec 05 2013, Dave Chinner wrote:
> On Wed, Dec 04, 2013 at 03:17:49PM +1100, Dave Chinner wrote:
> > On Tue, Dec 03, 2013 at 08:47:12PM -0700, Jens Axboe wrote:
> > > On Wed, Dec 04 2013, Dave Chinner wrote:
> > > > On Wed, Dec 04, 2013 at 12:58:38PM +1100, Dave Chinner wrote:
> > > > > On Wed, Dec 04, 2013 at 08:59:40AM +1100, Dave Chinner wrote:
> > > > > > Hi Jens,
> > > > > > 
> > > > > > Not sure who to direct this to or CC, so I figured you are the
> > > > > > person to do that. I just had xfstests generic/299 (an AIO/DIO test)
> > > > > > oops in dio_complete() like so:
> > > > > > 
> > ....
> > > > > > [ 9650.590630]  <IRQ>
> > > > > > [ 9650.590630]  [<ffffffff811ddae3>] dio_complete+0xa3/0x140
> > > > > > [ 9650.590630]  [<ffffffff811ddc2a>] dio_bio_end_aio+0x7a/0x110
> > > > > > [ 9650.590630]  [<ffffffff811ddbb5>] ? dio_bio_end_aio+0x5/0x110
> > > > > > [ 9650.590630]  [<ffffffff811d8a9d>] bio_endio+0x1d/0x30
> > > > > > [ 9650.590630]  [<ffffffff8175d65f>] blk_mq_complete_request+0x5f/0x120
> > > > > > [ 9650.590630]  [<ffffffff8175d736>] __blk_mq_end_io+0x16/0x20
> > > > > > [ 9650.590630]  [<ffffffff8175d7a8>] blk_mq_end_io+0x68/0xd0
> > > > > > [ 9650.590630]  [<ffffffff818539a7>] virtblk_done+0x67/0x110
> > > > > > [ 9650.590630]  [<ffffffff817f74c5>] vring_interrupt+0x35/0x60
> > .....
> > > > > And I just hit this from running xfs_repair which is doing
> > > > > multithreaded direct IO directly on /dev/vdc:
> > > > > 
> > ....
> > > > > [ 1776.510446] IP: [<ffffffff81755b6a>] blk_account_io_done+0x6a/0x180
> > ....
> > > > > [ 1776.512577]  [<ffffffff8175e4b8>] blk_mq_complete_request+0xb8/0x120
> > > > > [ 1776.512577]  [<ffffffff8175e536>] __blk_mq_end_io+0x16/0x20
> > > > > [ 1776.512577]  [<ffffffff8175e5a8>] blk_mq_end_io+0x68/0xd0
> > > > > [ 1776.512577]  [<ffffffff81852e47>] virtblk_done+0x67/0x110
> > > > > [ 1776.512577]  [<ffffffff817f7925>] vring_interrupt+0x35/0x60
> > > > > [ 1776.512577]  [<ffffffff810e48a4>] handle_irq_event_percpu+0x54/0x1e0
> > .....
> > > > > So this is looking like another virtio+blk_mq problem....
> > > > 
> > > > This one is definitely reproducable. Just hit it again...
> > > 
> > > I'll take a look at this. You don't happen to have gdb dumps of the
> > > lines associated with those crashes? Just to save me some digging
> > > time...
> > 
> > Only this:
> > 
> > (gdb) l *(dio_complete+0xa3)
> > 0xffffffff811ddae3 is in dio_complete (fs/direct-io.c:282).
> > 277                     }
> > 278
> > 279                     aio_complete(dio->iocb, ret, 0);
> > 280             }
> > 281
> > 282             kmem_cache_free(dio_cache, dio);
> > 283             return ret;
> > 284     }
> > 285
> > 286     static void dio_aio_complete_work(struct work_struct *work)
> > 
> > And this:
> > 
> > (gdb) l *(blk_account_io_done+0x6a)
> > 0xffffffff81755b6a is in blk_account_io_done (block/blk-core.c:2049).
> > 2044                    int cpu;
> > 2045
> > 2046                    cpu = part_stat_lock();
> > 2047                    part = req->part;
> > 2048
> > 2049                    part_stat_inc(cpu, part, ios[rw]);
> > 2050                    part_stat_add(cpu, part, ticks[rw], duration);
> > 2051                    part_round_stats(cpu, part);
> > 2052                    part_dec_in_flight(part, rw);
> > 2053
> > 
> > as I've rebuild the kernel with different patches since the one
> > running on the machine that is triggering the problem.
> 
> Any update on this, Jens? I've hit this blk_account_io_done() panic
> 10 times in the past 2 hours while trying to do xfs_repair
> testing....

No, sorry, no updates yet... I haven't had time to look into it today.
To reproduce tomorrow, can you mail me your exact setup (kvm invocation,
etc) and how your guest is setup and if there's any special way I need
to run xfstest or xfs_repair?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ