lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 6 Apr 2008 23:56:57 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	James Bottomley <James.Bottomley@...senPartnership.com>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Jens Axboe <jens.axboe@...cle.com>,
	Christoph Lameter <clameter@....com>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Peter Zijlstra <a.p.ziljstra@...llo.nl>,
	"Rafael J. Wysocki" <rjw@...k.pl>, linux-kernel@...r.kernel.org
Subject: [PATCH] scsi: fix sense_slab/bio swapping livelock

Since 2.6.25-rc7, I've been seeing an occasional livelock on one
x86_64 machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written;
most tasks of interest trying to reclaim, but each get_swap_bio
waiting for a bio in mempool_alloc's io_schedule_timeout(5*HZ);
every five seconds an atomic page allocation failure report from
kblockd failing to allocate a sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against
this, but rc1's [SCSI] use dynamically allocated sense buffer
de25deb18016f66dcdede165d07654559bb332bc upset that slightly.
When it fails to allocate from the separate sense_slab, instead
of giving up, it must fall back to the command free_list, which
is sure to have a sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which
merges slab caches when it can, and on 64-bit happens to merge
both bio cache and sense_slab cache into kmalloc's 128-byte cache:
so that under this swapping load, bios above are liable to gobble
up all the slots needed for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.
Adding a no-op constructor to the sense_slab inhibits SLUB from
merging it, and stops all the allocation failures I was seeing;
but it's rather a hack, and perhaps in different configurations
we have other caches on the swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from
the one kmem_cache; but that might waste more memory, and is
only a way of diverting around the known problem.

While I don't like seeing the allocation failures, and hate the
idea of all those bios piled up above a scsi host working one by
one, it does seem to emerge fairly soon with the livelock fix.
So lacking better ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@...itas.com>
---

 drivers/scsi/scsi.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

--- 2.6.25-rc8/drivers/scsi/scsi.c	2008-03-05 10:47:40.000000000 +0000
+++ linux/drivers/scsi/scsi.c	2008-04-05 22:23:40.000000000 +0100
@@ -181,6 +181,18 @@ struct scsi_cmnd *__scsi_get_command(str
 	cmd = kmem_cache_alloc(shost->cmd_pool->cmd_slab,
 			       gfp_mask | shost->cmd_pool->gfp_mask);
 
+	if (likely(cmd)) {
+		buf = kmem_cache_alloc(shost->cmd_pool->sense_slab,
+				       gfp_mask | shost->cmd_pool->gfp_mask);
+		if (likely(buf)) {
+			memset(cmd, 0, sizeof(*cmd));
+			cmd->sense_buffer = buf;
+		} else {
+			kmem_cache_free(shost->cmd_pool->cmd_slab, cmd);
+			cmd = NULL;
+		}
+	}
+
 	if (unlikely(!cmd)) {
 		unsigned long flags;
 
@@ -197,16 +209,6 @@ struct scsi_cmnd *__scsi_get_command(str
 			memset(cmd, 0, sizeof(*cmd));
 			cmd->sense_buffer = buf;
 		}
-	} else {
-		buf = kmem_cache_alloc(shost->cmd_pool->sense_slab,
-				       gfp_mask | shost->cmd_pool->gfp_mask);
-		if (likely(buf)) {
-			memset(cmd, 0, sizeof(*cmd));
-			cmd->sense_buffer = buf;
-		} else {
-			kmem_cache_free(shost->cmd_pool->cmd_slab, cmd);
-			cmd = NULL;
-		}
 	}
 
 	return cmd;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ