lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1225119789.5146.5.camel@localhost.localdomain>
Date:	Mon, 27 Oct 2008 10:03:09 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	linux-scsi <linux-scsi@...r.kernel.org>,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: Timeout regression introduced
	by	242f9dcb8ba6f68fcd217a119a7648a4f69290e9

On Mon, 2008-10-27 at 10:51 +0900, Tejun Heo wrote:
> James Bottomley wrote:
> > On Sun, 2008-10-26 at 18:46 +0900, Tejun Heo wrote:
> >> Hello, Jens.
> >>
> >> Commit 242f9dcb8ba6f68fcd217a119a7648a4f69290e9 introduces a strange
> >> regression for libata.  The second timeout gives puts different
> >> pointer from the issued command onto eh_cmd_q breaking libata EH
> >> command matching which triggers WARN_ON() in ata_eh_finish() and hangs
> >> command processing or causes oops later depending on circumstances.
> >>
> >> Here are logs with induced timeouts (patch attached).  In commit
> >> 242f9dcb8, the XXX messages for the second timeout shows different
> >> scsi_cmd pointers for eh_cmd_q and qc->scmd which is initialized by
> >> ata_scsi_qc_new() during command translation.
> > 
> > I can't see a way we could be getting a different command passed in from
> > the actual one, since the only way to lose the command from the request
> > is to go through the command completion routines which free it (and end
> > the request).
> 
> I have no idea either.  It's something in the timeout logic because on
> the issue path the scmd pointer is identical but on tiemout pointer
> for another scmd is queued on eh_cmd_q, which doesn't make much sense.

OK, so if we take the first trace as definitive, since it shows the same
command going around twice (being freed and reallocated from the slab in
between), it does tend to imply the one on eh_cmd_q is bogus.

Could you print out all scsi commands going into ata_qc_issue so we can
see if there's a clue in where this one is coming from?

> > However, since the WARN_ON is specifically comparing the command with
> > the one found by the active tag, could this actually be a problem caused
> > by block tags?  I note that libata still uses its own array of
> > outstanding tags (ap->qcmd[tag]) instead of finding them using
> > blk_queue_find_tag() (or scsi_find_tag()).
> 
> Nope, the tested commits are before the block queue tag transition and
> I tested two consecutive commits.  242f9dcb^ is okay.  242f9dcb is
> not.

Well, it was worth a shot.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ