lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1511261130120.12333@nippy.intranet>
Date:	Thu, 26 Nov 2015 21:23:01 +1100 (AEDT)
From:	Finn Thain <fthain@...egraphics.com.au>
To:	Ondrej Zary <linux@...nbow-software.org>
cc:	"James E.J. Bottomley" <JBottomley@...n.com>,
	Michael Schmitz <schmitzmic@...il.com>,
	linux-m68k@...r.kernel.org, linux-scsi@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 22/71] ncr5380: Eliminate selecting state


On Wed, 25 Nov 2015, Ondrej Zary wrote:

> On Wednesday 25 November 2015 04:17:09 Finn Thain wrote:
> > 
> > On Tue, 24 Nov 2015, Ondrej Zary wrote:
> > 
> > > 
> > > Bisecting slow module initialization pointed to this commit.
> > > 
> > > Before this commit (2 seconds):
> > > [   60.317374] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> > > [   60.780715] scsi 2:0:1:0: Direct-Access     QUANTUM  LP240S GM240S01X 4.6  PQ: 0 ANSI: 2 CCS
> > > [   62.606260] sd 2:0:1:0: Attached scsi generic sg1 type 0
> > > 
> > > 
> > > After this commit (22 seconds):
> > > [  137.511711] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> > > [  145.028532] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> > > [  145.029767] clocksource:                       'acpi_pm' wd_now: a49738 wd_last: f4da04 mask: ffffff
> > > [  145.029828] clocksource:                       'tsc' cs_now: 2ea624698e cs_last: 2c710aa17f mask: ffffffffffffffff
> > > [  145.032733] clocksource: Switched to clocksource acpi_pm
> > > [  145.236951] scsi 2:0:1:0: Direct-Access     QUANTUM  LP240S GM240S01X 4.6  PQ: 0 ANSI: 2 CCS
> > > [  159.959308] sd 2:0:1:0: Attached scsi generic sg1 type 0
> > > 
> > > 
> > 
> > This problem doesn't show up on my hardware, and I'd like to know where 
> > those 22 seconds are being spent. Would you please apply the entire series 
> > and add,
> > #define NDEBUG (NDEBUG_ARBITRATION | NDEBUG_SELECTION | NDEBUG_MAIN)
> > to the top of g_NCR5380.c and send me the messages logged during modprobe?
> 
> [  397.014581] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
> [  412.099695] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  412.103625] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  412.110503] scsi 2:0:1:0: Direct-Access     QUANTUM  LP240S GM240S01X 4.6  PQ: 0 ANSI: 2 CCS
> [  412.110892] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  412.114154] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  412.119733] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  427.198108] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  442.276586] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  457.354592] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  472.432999] STATUS_REG: 00
> BASR: 08
> ICR: 00
> MODE: 00
> [  487.513027] sd 2:0:1:0: Attached scsi generic sg1 type 0
> 
> 

I see that the delay increased from 22 seconds to 90 seconds. Perhaps that 
is because the remaining patches were applied.

Anyway, it seems to confirm that this delay comes from scanning the six 
unused SCSI bus IDs. Each one takes 15 seconds. Could be a five second 
timeout and three retries. It's hard to tell because the dsprintk() 
logging didn't show up. Can you try modprobe after,
# echo 1 > /sys/module/printk/parameters/ignore_loglevel
And maybe also,
# scsi_logging_level -sS7

It seems like the 53C400 chip doesn't arbitrate like a 53C80, but that 
doesn't make sense given it contains a 53C80 logic block. Perhaps some 
extra initialization is needed. Which card(s) are affected?

Aside from trying to figure out why the chip behaves oddly, we could also 
try to figure out whether the driver and its main loop is causing problems 
for other tasks i.e. timekeeping watchdog. Does this patch make any 
difference?

Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2015-11-26 15:42:01.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2015-11-26 15:42:04.000000000 +1100
@@ -925,6 +925,7 @@ static void NCR5380_main(struct work_str
 		container_of(work, struct NCR5380_hostdata, main_task);
 	struct Scsi_Host *instance = hostdata->host;
 	struct scsi_cmnd *cmd;
+	unsigned long deadline = jiffies + msecs_to_jiffies(10);
 	int done;
 
 	/*
@@ -984,9 +985,12 @@ static void NCR5380_main(struct work_str
 			done = 0;
 		}
 		spin_unlock_irq(&hostdata->lock);
-		if (!done)
-			cond_resched();
-	} while (!done);
+		if (done || time_is_before_jiffies(deadline))
+			break;
+		cond_resched();
+	} while (1);
+	if (!done)
+		queue_work(hostdata->work_q, &hostdata->main_task);
 }
 
 
Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2015-11-26 15:42:01.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2015-11-26 15:42:04.000000000 +1100
@@ -817,6 +817,7 @@ static void NCR5380_main(struct work_str
 		container_of(work, struct NCR5380_hostdata, main_task);
 	struct Scsi_Host *instance = hostdata->host;
 	struct scsi_cmnd *cmd;
+	unsigned long deadline = jiffies + msecs_to_jiffies(10);
 	int done;
 
 	do {
@@ -860,9 +861,12 @@ static void NCR5380_main(struct work_str
 			done = 0;
 		}
 		spin_unlock_irq(&hostdata->lock);
-		if (!done)
-			cond_resched();
-	} while (!done);
+		if (done || time_is_before_jiffies(deadline))
+			break;
+		cond_resched();
+	} while (1);
+	if (!done)
+		queue_work(hostdata->work_q, &hostdata->main_task);
 }
 
 #ifndef DONT_USE_INTR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ