lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 1 Feb 2017 08:59:36 +0100 From: Hannes Reinecke <hare@...e.de> To: Chad Dupuis <chad.dupuis@...ium.com> Cc: martin.petersen@...cle.com, linux-scsi@...r.kernel.org, fcoe-devel@...n-fcoe.org, netdev@...r.kernel.org, yuval.mintz@...ium.com, QLogic-Storage-Upstream@...ium.com Subject: Re: [PATCH V2 2/2] qedf: Add QLogic FastLinQ offload FCoE driver framework. On 01/31/2017 06:08 PM, Chad Dupuis wrote: > > On Mon, 30 Jan 2017, 10:34am -0000, Hannes Reinecke wrote: > >> On 01/25/2017 09:33 PM, Dupuis, Chad wrote: >>> From: "Dupuis, Chad" <chad.dupuis@...ium.com> >>> [ .. ] >>> + if (opcode == ELS_LS_RJT) { >>> + rjt = fc_frame_payload_get(fp, sizeof(*rjt)); >>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS, >>> + "Received LS_RJT for REC: er_reason=0x%x, " >>> + "er_explan=0x%x.\n", rjt->er_reason, rjt->er_explan); >>> + /* >>> + * The following response(s) mean that we need to reissue the >>> + * request on another exchange. We need to do this without >>> + * informing the upper layers lest it cause an application >>> + * error. >>> + */ >>> + if ((rjt->er_reason == ELS_RJT_LOGIC || >>> + rjt->er_reason == ELS_RJT_UNAB) && >>> + rjt->er_explan == ELS_EXPL_OXID_RXID) { >>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS, >>> + "Handle CMD LOST case.\n"); >>> + qedf_requeue_io_req(orig_io_req); >>> + } >> >> Care to explain this? >> I found requeing within the driver without notifying the upper layers >> deeply troubling. >> I've came across this (or a similar issue) during implementing >> FCoE over virtio; there it turned out to be an issue with the target not >> handling frames in the correct order. >> So it looks you are attempting to solve the problem of a REC being sent >> for a command which is not know at the target. >> Meaning it's either lost in the fabric, hasn't been seen by the target >> (yet), or having already been processed by the target. >> >> The above code would only solve the second problem; however, that would >> indicate a command ordering issue as the REC would have arrived at the >> target _before_ the originating command. >> So doing a resend would only help for _that_ specific case, not for the >> others. >> And it's not quite clear why resending with a different OXID would help >> here; typically it's the _RXID_ which isn't found ... > > The problem I've observed is that If I return an error to the SCSI layer > it propogates up to the st driver and the test application would fail > since st would fail the I/O. As the comment explains we do this on > another exchange internally in the driver so as not to cause the tape > backup application/test to fail. > > bnx2fc behaves like this IIRC. > Ah, yes. I've had this discussion with James Smart, too. When a command abort fails with 'Invalid OXID/RXID' we should be retrying the command with a different set of IDs, as the original need to quarantined. (Which, incidentaily, you don't do, right? You cannot re-use that particular exchange until you either got a response for it or you reset the exchange manager. If you don't you run into the risk of getting an invalid completion if for some reason the command _does_ return eventually...) Even though I'd rather like to have it resolved in libfc or even the SCSI midlayer itself, it's not something which should hold off the entire submission. >> >> [ .. ] >>> +static void qedf_fcoe_process_vlan_resp(struct qedf_ctx *qedf, >>> + struct sk_buff *skb) >>> +{ >>> + struct fip_header *fiph; >>> + struct fip_desc *desc; >>> + u16 vid = 0; >>> + ssize_t rlen; >>> + size_t dlen; >>> + >>> + fiph = (struct fip_header *)(((void *)skb->data) + 2 * ETH_ALEN + 2); >>> + >>> + rlen = ntohs(fiph->fip_dl_len) * 4; >>> + desc = (struct fip_desc *)(fiph + 1); >>> + while (rlen > 0) { >>> + dlen = desc->fip_dlen * FIP_BPW; >>> + switch (desc->fip_dtype) { >>> + case FIP_DT_VLAN: >>> + vid = ntohs(((struct fip_vlan_desc *)desc)->fd_vlan); >>> + break; >>> + } >>> + desc = (struct fip_desc *)((char *)desc + dlen); >>> + rlen -= dlen; >>> + } >>> + >>> + QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC, "VLAN response, " >>> + "vid=0x%x.\n", vid); >>> + >>> + if (vid > 0 && qedf->vlan_id != vid) { >>> + qedf_set_vlan_id(qedf, vid); >>> + >>> + /* Inform waiter that it's ok to call fcoe_ctlr_link up() */ >>> + complete(&qedf->fipvlan_compl); >>> + } >>> +} >>> + >> As mentioned, this will fail for HP VirtualConnect, which return a vid >> of '0', indicating that the FCoE link should run on the interface itself. >> And this is actually a valid response, I would think that you should be >> calling complete() for any response ... > > In testing I would get responses that would seem to contain VID 0 but then > subsequent responses contain the valid VID. If I try to go with a VID of > 0 FIP discovery would fail. It could be possible that there's a bug in > the code that parses the FIP VLAN response but I'll like to leave this in > for now and it can be removed once I figure where the issue may be. > Okay. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@...e.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
Powered by blists - more mailing lists