lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZttIN8He8TOZ7Lct@google.com>
Date: Fri, 6 Sep 2024 18:21:43 +0000
From: Igor Pylypiv <ipylypiv@...gle.com>
To: Niklas Cassel <cassel@...nel.org>
Cc: Damien Le Moal <dlemoal@...nel.org>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	Hannes Reinecke <hare@...e.de>, linux-ide@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ata: libata-eh: Clear scsicmd->result when setting
 SAM_STAT_CHECK_CONDITION

On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote:

Hi Niklas,

Thank you so much for a thorough review and coming up with a better patch!

> Hello Igor,
> 
> On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> > commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> > not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> > set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> > set_status_byte() which does not clear the scsicmd->result.
> 
> "which does not clear the scsicmd->result"
> 
> scsicmd->result is a combination of:
> -SCSI status byte
> -SCSI ML byte
> -host byte
> 
> Please be more specific of which byte(s) that you want to clear,
> both here and in other places in the commit message.
> 
> 
> > 
> > By not clearing the scsicmd->result we end up in a state where both
> > the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> > bytes are set.
> > 
> 
> 
> > The DID_TIME_OUT host byte is getting set a part of error handling:
> > 
> > ata_qc_complete()
> >     ata_qc_schedule_eh()
> >         blk_abort_request()
> >             WRITE_ONCE(req->deadline, jiffies);
> > 
> > blk_mq_timeout_work()
> >     blk_mq_check_expired()
> >         blk_mq_rq_timed_out()
> > 	    req->q->mq_ops->timeout() / scsi_timeout()
> >                 set_host_byte(scmd, DID_TIME_OUT);
> 
> I would have reorder your commit log and have this first in the commit log.
> 
> 
> > 
> > Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> > is confusing. Let's bring the old behavior back by setting scmd->result to
> > SAM_STAT_CHECK_CONDITION.
> 
> I think you are missing something very important in the commit log here:
> What is the user visible change before and after your change?
>

For ATA PT commands the 'result' value is being returned to user space so
not clearing the DID_TIME_OUT is user visible.

> Is there a difference is the error message in dmesg? If not, that should
> be mentioned as well.
> 

I haven't seen any dmesg errors reported for ATA PT hence I don't think there
is any difference in dmesg.

> 
> > 
> > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> > Signed-off-by: Igor Pylypiv <ipylypiv@...gle.com>
> > ---
> >  drivers/ata/libata-eh.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index 214b935c2ced..4927b40e782f 100644
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
> >  		 */
> >  		if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
> >  		    (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> > -			set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> > +			qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> 
> ata_eh_analyze_tf() will only be called on commands that are owned by EH
> (has ATA_QCFLAG_EH set).
> 
> Thus this command will end up in:
> ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() ->
> -> __ata_qc_complete() -> qc->complete_fn().
> 
> complete_fn will be (except in special cases): ata_scsi_qc_complete()
> If you look at ata_scsi_qc_complete(), it already clears the host byte:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696
>
> So could you please be more specific of what problem this change is fixing?
> Is it just that you think that it makes sense to clear the host byte earlier
> in the call chain?

I should have mentioned that the issue is specific to ATA PT commands.
The problem is that ata_scsi_qc_complete() is not clearing the host byte
for ATA PT commands causing the DID_TIME_OUT to be returned.

> 
> There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
> commands will not fetch sense data via ata_eh_request_sense(), so clearing
> the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
> to do a similar change to yours in all the different code paths that sets
> sense data ...which might actually be something that makes sense, but then
> I would expect a patch series that changes all the locations where we set
> sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
> ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
> libata-scsi: do not overwrite SCSI ML and status bytes")).
> 
> 
> See this patch to see all the places where we set sense data:
> https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775
> 
> 
> 
> 
> Side note:
> You might also be interested to know that a command that was sent via EH will
> be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()),
> and will thus end up in scsi_eh_flush_done_q(). A command that has sense data
> will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you
> will end up in this code path:
> https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227
> 
> Which means that SCSI EH will set DID_TIME_OUT for any command that does
> not have (SCSI ML byte || SCSI status byte || host byte) set.
> 
> A command with sense data will have most often have CHECK_CONDITION set, but
> there is also CDL policy 0xD commands (which will have the SCSI ML byte set).
> So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed
> successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD
> commands which did have SK/ASC/ASCQ set.
> For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful
> commands via EH").
> 
> 
> Kind regards,
> Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ