lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Oct 2016 16:52:55 -0600
From:   Jens Axboe <axboe@...com>
To:     Dave Jones <davej@...emonkey.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Chris Mason <clm@...com>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
        David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.

On 10/26/2016 04:40 PM, Dave Jones wrote:
> On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:
>
>  > Could you try the attached patch? It adds a couple of sanity tests:
>  >
>  >  - a number of tests to verify that 'rq->queuelist' isn't already on
>  > some queue when it is added to a queue
>  >
>  >  - one test to verify that rq->mq_ctx is the same ctx that we have locked.
>  >
>  > I may be completely full of shit, and this patch may be pure garbage
>  > or "obviously will never trigger", but humor me.
>
> I gave it a shot too for shits & giggles.
> This falls out during boot.
>
> [    9.244030] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
> [    9.271391] ------------[ cut here ]------------
> [    9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0
> [    9.285613] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-rc2-think+ #4

Very odd, don't immediately see how that can happen. For testing, can
you try and add the below patch? Just curious if that fixes the list
corruption. Thing is, I don't see how ->mq_ctx and ctx are different in
this path, but I can debug that on the side.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index ddc2eed64771..73b9462aa21f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1165,9 +1165,10 @@ static inline bool hctx_allow_merges(struct 
blk_mq_hw_ctx *hctx)
  }

  static inline bool blk_mq_merge_queue_io(struct blk_mq_hw_ctx *hctx,
-					 struct blk_mq_ctx *ctx,
  					 struct request *rq, struct bio *bio)
  {
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
+
  	if (!hctx_allow_merges(hctx) || !bio_mergeable(bio)) {
  		blk_mq_bio_to_request(rq, bio);
  		spin_lock(&ctx->lock);
@@ -1338,7 +1339,7 @@ static blk_qc_t blk_mq_make_request(struct 
request_queue *q, struct bio *bio)
  		goto done;
  	}

-	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
+	if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) {
  		/*
  		 * For a SYNC request, send it to the hardware immediately. For
  		 * an ASYNC request, just ensure that we run it later on. The
@@ -1416,7 +1417,7 @@ static blk_qc_t blk_sq_make_request(struct 
request_queue *q, struct bio *bio)
  		return cookie;
  	}

-	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
+	if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) {
  		/*
  		 * For a SYNC request, send it to the hardware immediately. For
  		 * an ASYNC request, just ensure that we run it later on. The

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ