linux-kernel - Re: [PATCH]: PCI Error Recovery: Symbios SCSI device driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061031231334.GR6360@austin.ibm.com>
Date:	Tue, 31 Oct 2006 17:13:34 -0600
From:	linas@...tin.ibm.com (Linas Vepstas)
To:	Matthew Wilcox <matthew@....cx>
Cc:	linux-scsi@...r.kernel.org, linux-pci@...ey.karlin.mff.cuni.cz,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: PCI Error Recovery: Symbios SCSI device driver

On Tue, Oct 31, 2006 at 11:55:07AM -0700, Matthew Wilcox wrote:
> On Fri, Oct 20, 2006 at 01:05:10PM -0500, Linas Vepstas wrote:
> > Index: linux-2.6.19-rc1-git11/drivers/scsi/sym53c8xx_2/sym_glue.c
> 
> This needs to be before the printf_debug call.

Right. This'll be in the next patch submision.

> > @@ -726,6 +731,19 @@ static int sym_eh_handler(int op, char *
> > +	/* We may be in an error condition because the PCI bus
> > +	 * went down. In this case, we need to wait until the
> > +	 * PCI bus is reset, the card is reset, and only then
> > +	 * proceed with the scsi error recovery.  There's no
> > +	 * point in hurrying; take a leisurely wait.
> > +	 */
> > +#define WAIT_FOR_PCI_RECOVERY	35
> > +	if ((np->s.device->error_state != pci_channel_io_normal) &&
> > +	    (np->s.device->error_state != 0) &&
> > +	    (wait_for_completion_timeout(&np->s.io_reset_wait,
> > +		                         WAIT_FOR_PCI_RECOVERY*HZ) == 0))
> > +			return SCSI_FAILED;
> > +
> 
> Is it safe / reasonable / a good idea to sleep for 35 seconds in the EH
> handler?  I'm not that familiar with how the EH code works.  It has its
> own thread, so I suppose that's OK.

As James pointed out, the pci channel is is not available until the 
reset sequence is done.  The 35 seconds seemed like a reasonable time
to wait for the pci reset to complete; hopefuly it will complete much 
sooner.  If the pci reset fails for some reason, then things are hosed, 
and the sysadmin will need to intervene.

> Are the driver's data structures still intact after a reset?

They should be. No one is attempting to free or shut down the driver.

> I generally prefer not to be so perlish in conditionals, ie:

I wasn't sure what style is popular. Actually, I agree with you on 
that, and picked the other style cause i thought it was prefered.
Nex patch submission will be more nested.

> 	if ((np->s.device->error_state != pci_channel_io_normal) &&
> 	    (np->s.device->error_state != 0) {
> 
> Why is the condition so complicated though?  What does 0 mean if it's
> not io_normal?  

Its an unresolved stupidity. For some imagined, hypothetical
reason, it momentarily seemed wise to make pci_channel_io_normal 
be non-zero; but this imagined reason, although vague, did manage
to bite back, as the above code demonstrates.

> At least let's hide that behind a convenience macro:
> 
> 	if (abnormal_error_state(np->s.device->error_state)) {

Should I submit a patch to make pci_channel_io_normal be zero,
or submit a patch to define abnormal_error_state, or both?
Both, probably; I don't have much of an opinion.

> Though, since INB and INW will return 0xff and 0xffff, why not use that
> as our test rather than using a counter?

Right. I wanted to avoid checking for specific values, 
as that vaguely seemed more robust; the direct check is easier.
I'll change this.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/