[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2203e514-5ca8-28e2-af22-537899407cf9@suse.de>
Date: Wed, 1 Feb 2017 08:59:36 +0100
From: Hannes Reinecke <hare@...e.de>
To: Chad Dupuis <chad.dupuis@...ium.com>
Cc: martin.petersen@...cle.com, linux-scsi@...r.kernel.org,
fcoe-devel@...n-fcoe.org, netdev@...r.kernel.org,
yuval.mintz@...ium.com, QLogic-Storage-Upstream@...ium.com
Subject: Re: [PATCH V2 2/2] qedf: Add QLogic FastLinQ offload FCoE driver
framework.
On 01/31/2017 06:08 PM, Chad Dupuis wrote:
>
> On Mon, 30 Jan 2017, 10:34am -0000, Hannes Reinecke wrote:
>
>> On 01/25/2017 09:33 PM, Dupuis, Chad wrote:
>>> From: "Dupuis, Chad" <chad.dupuis@...ium.com>
>>>
[ .. ]
>>> + if (opcode == ELS_LS_RJT) {
>>> + rjt = fc_frame_payload_get(fp, sizeof(*rjt));
>>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
>>> + "Received LS_RJT for REC: er_reason=0x%x, "
>>> + "er_explan=0x%x.\n", rjt->er_reason, rjt->er_explan);
>>> + /*
>>> + * The following response(s) mean that we need to reissue the
>>> + * request on another exchange. We need to do this without
>>> + * informing the upper layers lest it cause an application
>>> + * error.
>>> + */
>>> + if ((rjt->er_reason == ELS_RJT_LOGIC ||
>>> + rjt->er_reason == ELS_RJT_UNAB) &&
>>> + rjt->er_explan == ELS_EXPL_OXID_RXID) {
>>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
>>> + "Handle CMD LOST case.\n");
>>> + qedf_requeue_io_req(orig_io_req);
>>> + }
>>
>> Care to explain this?
>> I found requeing within the driver without notifying the upper layers
>> deeply troubling.
>> I've came across this (or a similar issue) during implementing
>> FCoE over virtio; there it turned out to be an issue with the target not
>> handling frames in the correct order.
>> So it looks you are attempting to solve the problem of a REC being sent
>> for a command which is not know at the target.
>> Meaning it's either lost in the fabric, hasn't been seen by the target
>> (yet), or having already been processed by the target.
>>
>> The above code would only solve the second problem; however, that would
>> indicate a command ordering issue as the REC would have arrived at the
>> target _before_ the originating command.
>> So doing a resend would only help for _that_ specific case, not for the
>> others.
>> And it's not quite clear why resending with a different OXID would help
>> here; typically it's the _RXID_ which isn't found ...
>
> The problem I've observed is that If I return an error to the SCSI layer
> it propogates up to the st driver and the test application would fail
> since st would fail the I/O. As the comment explains we do this on
> another exchange internally in the driver so as not to cause the tape
> backup application/test to fail.
>
> bnx2fc behaves like this IIRC.
>
Ah, yes.
I've had this discussion with James Smart, too.
When a command abort fails with 'Invalid OXID/RXID' we should be
retrying the command with a different set of IDs, as the original
need to quarantined.
(Which, incidentaily, you don't do, right? You cannot re-use that
particular exchange until you either got a response for it or you reset
the exchange manager. If you don't you run into the risk of getting an
invalid completion if for some reason the command _does_ return
eventually...)
Even though I'd rather like to have it resolved in libfc or even the
SCSI midlayer itself, it's not something which should hold off the
entire submission.
>>
>> [ .. ]
>>> +static void qedf_fcoe_process_vlan_resp(struct qedf_ctx *qedf,
>>> + struct sk_buff *skb)
>>> +{
>>> + struct fip_header *fiph;
>>> + struct fip_desc *desc;
>>> + u16 vid = 0;
>>> + ssize_t rlen;
>>> + size_t dlen;
>>> +
>>> + fiph = (struct fip_header *)(((void *)skb->data) + 2 * ETH_ALEN + 2);
>>> +
>>> + rlen = ntohs(fiph->fip_dl_len) * 4;
>>> + desc = (struct fip_desc *)(fiph + 1);
>>> + while (rlen > 0) {
>>> + dlen = desc->fip_dlen * FIP_BPW;
>>> + switch (desc->fip_dtype) {
>>> + case FIP_DT_VLAN:
>>> + vid = ntohs(((struct fip_vlan_desc *)desc)->fd_vlan);
>>> + break;
>>> + }
>>> + desc = (struct fip_desc *)((char *)desc + dlen);
>>> + rlen -= dlen;
>>> + }
>>> +
>>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC, "VLAN response, "
>>> + "vid=0x%x.\n", vid);
>>> +
>>> + if (vid > 0 && qedf->vlan_id != vid) {
>>> + qedf_set_vlan_id(qedf, vid);
>>> +
>>> + /* Inform waiter that it's ok to call fcoe_ctlr_link up() */
>>> + complete(&qedf->fipvlan_compl);
>>> + }
>>> +}
>>> +
>> As mentioned, this will fail for HP VirtualConnect, which return a vid
>> of '0', indicating that the FCoE link should run on the interface itself.
>> And this is actually a valid response, I would think that you should be
>> calling complete() for any response ...
>
> In testing I would get responses that would seem to contain VID 0 but then
> subsequent responses contain the valid VID. If I try to go with a VID of
> 0 FIP discovery would fail. It could be possible that there's a bug in
> the code that parses the FIP VLAN response but I'll like to leave this in
> for now and it can be removed once I figure where the issue may be.
>
Okay.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@...e.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
Powered by blists - more mailing lists