lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090215115823.GB19464@elte.hu>
Date:	Sun, 15 Feb 2009 12:58:23 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	linux-scsi@...r.kernel.org, Alan Stern <stern@...land.harvard.edu>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: scsi: aic7xxx hang since v2.6.28-rc1 ...


* Ingo Molnar <mingo@...e.hu> wrote:

> Here's an SCSI regression i tracked down recently. I'll follow up with
> more info.

I sent this to James Bottomley about a month ago, who suggested that the
bug looks similar to problems caused by:

 | commit b60af5b0adf0da24c673598c8d3fb4d4189a15ce
 | Author: Alan Stern <stern@...land.harvard.edu>
 | Date:   Mon Nov 3 15:56:47 2008 -0500
 |
 |     [SCSI] simplify scsi_io_completion()

I could not revert that patch because it had a lot of followup dependencies,
but by experimentation i figured out the following string of gradual reverts
to scsi_lib.c [the revert commits can be found in tip:out-of-tree]:

 813104e: Revert "[SCSI] simplify scsi_io_completion()"
 84db545: Revert "[SCSI] Fix uninitialized variable error in scsi_io_completion"
 0eb6038: Revert "[SCSI] Fix error handling for DIF/DIX"
 3cd94dd: Revert "[SCSI] scsi_lib: don't decrement busy counters when inserting commands"
 c27aed5: Revert "[SCSI] scsi_lib: fix DID_RESET status problems"

These reverts solved the problem and the box has not locked up in the SCSI irq 
completion code since then. The code has not had any changes upstream since i
did the reverts, so the bug is still relevant as of .29-rc5.

( James suggested i send this bugreport to this list too, so that it does not
  get single-threaded on him as he is busy with other things - so more suggestions
  are welcome. I can try proposed fix patches. James suggested the patch below
  and i dont think it will show us much more than what we already know: that we
  are looping in scsi_run_queue(). )

	Ingo

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 940dc32..5919dd0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -593,6 +593,7 @@ static void scsi_run_queue(struct request_queue *q)
 	struct Scsi_Host *shost = sdev->host;
 	LIST_HEAD(starved_list);
 	unsigned long flags;
+	int count = 0;
 
 	if (scsi_target(sdev)->single_lun)
 		scsi_single_lun_run(sdev);
@@ -603,6 +604,8 @@ static void scsi_run_queue(struct request_queue *q)
 	while (!list_empty(&starved_list)) {
 		int flagset;
 
+		BUG_ON(count++ > 1000);
+
 		/*
 		 * As long as shost is accepting commands and we have
 		 * starved queues, call blk_run_queue. scsi_request_fn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ