linux-kernel - Re: [PATCH] drivers/ata: Add the SW NCQ support to sata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-id: <464E3764.90900@shaw.ca>
Date:	Fri, 18 May 2007 17:31:48 -0600
From:	Robert Hancock <hancockr@...w.ca>
To:	Kuan Luo <kluo@...dia.com>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peer Chen <pchen@...dia.com>, linux-ide@...r.kernel.org,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for
 MCP51/MCP55/MCP61

Kuan Luo wrote:
> Thanks for your comment, see the explaination inline.
> We'll apply your advice in later patch.

...

> Please don't duplicate this code in the driver, this is part of libata 
> core in libata-scsi.c. Add an export for these functions if you need to 
> use them in the driver.
> [kuan]: These calls are declared in static type . I can't export them
> and don't want to
> modify libata code.

If you really need these functions then they should be made non-static 
and exported. Duplicating the code is not a solution.

I'm not so sure you actually need all that, though. I suspect you can 
likely handle the deferring of commands if you detect an FPDMA data 
phase inside the qc_issue function only (like you already do in some 
cases) instead of having to mess with deferring them at the SCSI layer.

> I'm still puzzling out how this stuff all works, but it looks like this 
> code makes you stop sending new commands if:
> 
> -the port is in the FPDMA Data Phase (DMA Setup FIS received but the 
> transfer is not complete yet) - I assume the hardware doesn't handle 
> this itself, which seems rather unique
> -we previously deferred a command inside of qc_issue because we were in 
> the FPDMA data phase
> -we previously saw dhfis_flags not equal to qc_active, or we got a 
> BACKOUT interrupt (whatever exactly that means), both of which set some 
> value in the back_byte
> [kuan]: 
> -If we got BACKOUT interrupt, it means that a command just sent by
> driver
> backed out.The driver should resend the command.So new commands should
> be defered.
> -If dhfis_flags != qc_active, it indicates that the last command doesn't
> generate a device to host register FIS .
> After sending some commands, I found that the last command sometimes has
> this problem 
> but previous commands are normal.In this case, we need resend the last
> command.
> Both cases set back_byte. 

The case where the command didn't generate a D2H FIS should likely be 
investigated further, otherwise we don't necessarily know that this 
workaround will work in all cases?

> This code seems a bit odd. Isn't this tossing out a bunch of potential 
> error status, etc?
> [kuan]:
> If there are commands in queue, the driver can  send  a new command 
> only after receiving dhfis intr of previous command and before receiving
> any dmasetup fis intr.
> In this place,  i do the last check before sending the command.

But the D2H FIS can contain an error indication, correct? If that 
happens here it won't detect this. In this situation error handling 
should be triggered.

> 	+	done_mask = pp->qc_active ^ sactive;
> 	+	if (unlikely(done_mask & sactive)) {
> 	+		ata_port_printk(ap, KERN_ERR, "illegal qc_active
> transition "
> 	+				"(%08x->%08x)\n", ap->qc_active,
> sactive);
> 	+		return -EINVAL;
> 	+	}	
> 
> Shouldn't this trigger error handling if it happens instead of just 
> printing an error?
> [kuan]:
> I think the error handling can be triggered by timeout. In fact, 
> this case should seldom happen.

There have been reports of some drives with bad NCQ implementations that 
return completion status for commands that were not issued. If we detect 
this case we should raise an HSM violation which will disable NCQ on 
this drive if it happens repeatedly. See the code in ahci.c in 
ahci_host_intr.

This comment still applies:

> Additional/general comments:
> 
> Think you need some code to handle suspend and resume (re-enable SATA 
> MMIO space, etc.)

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@...pamshaw.ca
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/