lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 1 Feb 2017 08:59:36 +0100
From:   Hannes Reinecke <hare@...e.de>
To:     Chad Dupuis <chad.dupuis@...ium.com>
Cc:     martin.petersen@...cle.com, linux-scsi@...r.kernel.org,
        fcoe-devel@...n-fcoe.org, netdev@...r.kernel.org,
        yuval.mintz@...ium.com, QLogic-Storage-Upstream@...ium.com
Subject: Re: [PATCH V2 2/2] qedf: Add QLogic FastLinQ offload FCoE driver
 framework.

On 01/31/2017 06:08 PM, Chad Dupuis wrote:
>
> On Mon, 30 Jan 2017, 10:34am -0000, Hannes Reinecke wrote:
>
>> On 01/25/2017 09:33 PM, Dupuis, Chad wrote:
>>> From: "Dupuis, Chad" <chad.dupuis@...ium.com>
>>>
[ .. ]
>>> +	if (opcode == ELS_LS_RJT) {
>>> +		rjt = fc_frame_payload_get(fp, sizeof(*rjt));
>>> +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
>>> +		    "Received LS_RJT for REC: er_reason=0x%x, "
>>> +		    "er_explan=0x%x.\n", rjt->er_reason, rjt->er_explan);
>>> +		/*
>>> +		 * The following response(s) mean that we need to reissue the
>>> +		 * request on another exchange.  We need to do this without
>>> +		 * informing the upper layers lest it cause an application
>>> +		 * error.
>>> +		 */
>>> +		if ((rjt->er_reason == ELS_RJT_LOGIC ||
>>> +		    rjt->er_reason == ELS_RJT_UNAB) &&
>>> +		    rjt->er_explan == ELS_EXPL_OXID_RXID) {
>>> +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
>>> +			    "Handle CMD LOST case.\n");
>>> +			qedf_requeue_io_req(orig_io_req);
>>> +		}
>>
>> Care to explain this?
>> I found requeing within the driver without notifying the upper layers
>> deeply troubling.
>> I've came across this (or a similar issue) during implementing
>> FCoE over virtio; there it turned out to be an issue with the target not
>> handling frames in the correct order.
>> So it looks you are attempting to solve the problem of a REC being sent
>> for a command which is not know at the target.
>> Meaning it's either lost in the fabric, hasn't been seen by the target
>> (yet), or having already been processed by the target.
>>
>> The above code would only solve the second problem; however, that would
>> indicate a command ordering issue as the REC would have arrived at the
>> target _before_ the originating command.
>> So doing a resend would only help for _that_ specific case, not for the
>> others.
>> And it's not quite clear why resending with a different OXID would help
>> here; typically it's the _RXID_ which isn't found ...
>
> The problem I've observed is that If I return an error to the SCSI layer
> it propogates up to the st driver and the test application would fail
> since st would fail the I/O.  As the comment explains we do this on
> another exchange internally in the driver so as not to cause the tape
> backup application/test to fail.
>
> bnx2fc behaves like this IIRC.
>
Ah, yes.
I've had this discussion with James Smart, too.
When a command abort fails with 'Invalid OXID/RXID' we should be 
retrying the command with a different set of IDs, as the original
need to quarantined.
(Which, incidentaily, you don't do, right? You cannot re-use that 
particular exchange until you either got a response for it or you reset 
the exchange manager. If you don't you run into the risk of getting an 
invalid completion if for some reason the command _does_ return 
eventually...)

Even though I'd rather like to have it resolved in libfc or even the 
SCSI midlayer itself, it's not something which should hold off the 
entire submission.

>>
>> [ .. ]
>>> +static void qedf_fcoe_process_vlan_resp(struct qedf_ctx *qedf,
>>> +	struct sk_buff *skb)
>>> +{
>>> +	struct fip_header *fiph;
>>> +	struct fip_desc *desc;
>>> +	u16 vid = 0;
>>> +	ssize_t rlen;
>>> +	size_t dlen;
>>> +
>>> +	fiph = (struct fip_header *)(((void *)skb->data) + 2 * ETH_ALEN + 2);
>>> +
>>> +	rlen = ntohs(fiph->fip_dl_len) * 4;
>>> +	desc = (struct fip_desc *)(fiph + 1);
>>> +	while (rlen > 0) {
>>> +		dlen = desc->fip_dlen * FIP_BPW;
>>> +		switch (desc->fip_dtype) {
>>> +		case FIP_DT_VLAN:
>>> +			vid = ntohs(((struct fip_vlan_desc *)desc)->fd_vlan);
>>> +			break;
>>> +		}
>>> +		desc = (struct fip_desc *)((char *)desc + dlen);
>>> +		rlen -= dlen;
>>> +	}
>>> +
>>> +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC, "VLAN response, "
>>> +		   "vid=0x%x.\n", vid);
>>> +
>>> +	if (vid > 0 && qedf->vlan_id != vid) {
>>> +		qedf_set_vlan_id(qedf, vid);
>>> +
>>> +		/* Inform waiter that it's ok to call fcoe_ctlr_link up() */
>>> +		complete(&qedf->fipvlan_compl);
>>> +	}
>>> +}
>>> +
>> As mentioned, this will fail for HP VirtualConnect, which return a vid
>> of '0', indicating that the FCoE link should run on the interface itself.
>> And this is actually a valid response, I would think that you should be
>> calling complete() for any response ...
>
> In testing I would get responses that would seem to contain VID 0 but then
> subsequent responses contain the valid VID.  If I try to go with a VID of
> 0 FIP discovery would fail.  It could be possible that there's a bug in
> the code that parses the FIP VLAN response but I'll like to leave this in
> for now and it can be removed once I figure where the issue may be.
>
Okay.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@...e.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Powered by blists - more mailing lists