[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1189027622.19638.42.camel@dhcp-10-13-106-205.broadcom.com>
Date: Wed, 05 Sep 2007 14:27:02 -0700
From: "Anil Veerabhadrappa" <anilgv@...adcom.com>
To: "Mike Christie" <mchristi@...hat.com>
cc: "Michael Chan" <mchan@...adcom.com>, davem@...emloft.net,
netdev@...r.kernel.org, open-iscsi@...glegroups.com,
talm@...adcom.com, lusinsky@...adcom.com, uri@...adcom.com,
"SCSI Mailing List" <linux-scsi@...r.kernel.org>
Subject: Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.
On Wed, 2007-09-05 at 13:34 -0500, Mike Christie wrote:
> Michael Chan wrote:
> > +* This file defines HSI constants for the iSCSI flows
> > +*/
> > +
> > +/* iSCSI request op codes */
> > +#define ISCSI_OPCODE_NOP_OUT (0 | 0x40)
> > +#define ISCSI_OPCODE_SCSI_CMD (1)
> > +#define ISCSI_OPCODE_TMF_REQUEST (2 | 0x40)
> > +#define ISCSI_OPCODE_LOGIN_REQUEST (3 | 0x40)
> > +#define ISCSI_OPCODE_TEXT_REQUEST (4 | 0x40)
> > +#define ISCSI_OPCODE_DATA_OUT (5)
> > +#define ISCSI_OPCODE_LOGOUT_REQUEST (6 | 0x00)
> > +#define ISCSI_OPCODE_CLEANUP_REQUEST (7)
>
>
> What is ISCSI_OPCODE_CLEANUP_REQUEST?
This is a local message between iSCSI driver and the firmware to cleanup
iSCSI task resources held by chip. This operation is required to reclaim
SCSI command related resources after the TMF response is received for a
task and before it is completed to SCSI-ML
>
>
> > +
> > +/* iSCSI response/messages op codes */
> > +#define ISCSI_OPCODE_NOP_IN (0x20)
> > +#define ISCSI_OPCODE_SCSI_RESPONSE (0x21)
> > +#define ISCSI_OPCODE_TMF_RESPONSE (0x22)
> > +#define ISCSI_OPCODE_LOGIN_RESPONSE (0x23)
> > +#define ISCSI_OPCODE_TEXT_RESPONSE (0x24)
> > +#define ISCSI_OPCODE_DATA_IN (0x25)
> > +#define ISCSI_OPCODE_LOGOUT_RESPONSE (0x26)
> > +#define ISCSI_OPCODE_CLEANUP_RESPONSE (0x27)
> > +#define ISCSI_OPCODE_R2T (0x31)
> > +#define ISCSI_OPCODE_ASYNC_MSG (0x32)
> > +#define ISCSI_OPCODE_REJECT (0x3f)
> > +#define ISCSI_OPCODE_NOPOUT_LOCAL_COMPLETION (0)
>
>
> What does the LOCAL_COMPLETION on the nopout mean?
This is the completion notification for NOPOUT pdus which does not
involve response from the target. Following two NOPOUTs are classified
under this -
1. Initiator's proactive nopout with ITT=0xFFFFFFFF
2. Initiator's response to target nopin with TTT != 0xFFFFFFFF
>
>
> > +
> > +/* iSCSI stages */
> > +#define ISCSI_STAGE_SECURITY_NEGOTIATION (0)
> > +#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1)
> > +#define ISCSI_STAGE_FULL_FEATURE_PHASE (3)
> > +/* Logout response codes */
> > +#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0)
> > +#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1)
> > +#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3)
> > +
> > +/* iSCSI task types */
> > +#define ISCSI_TASK_TYPE_READ (0)
> > +#define ISCSI_TASK_TYPE_WRITE (1)
> > +#define ISCSI_TASK_TYPE_MPATH (2)
>
>
>
>
> All of these iscsi code shoulds be in iscsi_proto.h or should be added
> there.
This is a very tricky proposal as this header file is automatically
generated by a well defined process and is shared between various driver
supporting multiple platform/OS and the firmware. If it is not of a big
issue I would like to keep it the way it is.
>
>
> > +
> > +#endif
> > diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
> > new file mode 100644
> > index 0000000..b74a93c
> > --- /dev/null
> > +++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
> > @@ -0,0 +1,1977 @@
> > +
> > + if (tmf_cqe->response == ISCSI_TMF_RSP_COMPLETE) {
> > + if (aborted_cmd->scsi_cmd) {
> > + aborted_cmd->scsi_cmd->result = DID_ABORT << 16;
> > + aborted_cmd->scsi_cmd->resid =
>
> should use resid wrappers throughout driver.
>
> > + aborted_cmd->scsi_cmd->request_bufflen;
>
> Should use bufflen wrappers throught driver.
Will look into this.
> > + cmd->cmd_state = ISCSI_CMD_STATE_INITIATED;
> > + sc->SCp.ptr = (char *) cmd;
> > +
> > + if (cmd->req.itt != ITT_INVALID_SIGNATURE) {
> > + bnx2i_send_iscsi_scsicmd(conn, cmd);
> > + list_add_tail(&cmd->link, &sess->active_cmds);
> > + sess->num_active_cmds++;
> > + }
> > + return 0;
> > +
> > +iscsi_win_closed:
> > +cmd_not_accepted:
> > + return SCSI_MLQUEUE_HOST_BUSY;
> > +
> > +dev_not_found:
> > + sc->sense_buffer[0] = 0x70;
> > + sc->sense_buffer[2] = NOT_READY;
> > + sc->sense_buffer[7] = 0x6;
> > + sc->sense_buffer[12] = 0x08;
> > + sc->sense_buffer[13] = 0x00;
>
>
> do not fake sense and do not face sense and set a host byte of
> DID_NO_CONNECT. DID_NO_CONNECT is fine.
ok, thanks.
>
>
> > + sc->result = (DID_NO_CONNECT << 16);
> > + sc->resid = sc->request_bufflen;
> > + sc->scsi_done(sc);
> > + return 0;
> > +}
> > +
> > +
> > +
> > +/*
> > + * TMF request timeout handler
> > + */
> > +static void bnx2i_iscsi_tmf_timer(unsigned long data)
> > +{
> > + struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) data;
> > +
> > + printk(KERN_ALERT "TMF timer: abort failed, cmd 0x%p\n", cmd);
> > + cmd->cmd_state = ISCSI_CMD_STATE_TMF_TIMEOUT;
> > + cmd->conn->sess->recovery_state = ISCSI_SESS_RECOVERY_OPEN_ISCSI;
> > + iscsi_conn_error(cmd->conn->cls_conn, ISCSI_ERR_CONN_FAILED);
> > +}
> > +
> > +
> > +/*
> > + * initiate command abort process by requesting CNIC to send
> > + * an iSCSI TMF request to target
> > + */
> > +static int bnx2i_initiate_abort_cmd(struct scsi_cmnd *sc)
> > +{
> > + struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) sc->SCp.ptr;
> > + struct bnx2i_cmd *tmf_cmd;
> > + struct Scsi_Host *shost = cmd->scsi_cmd->device->host;
> > + struct bnx2i_conn *conn = cmd->conn;
> > + struct bnx2i_sess *sess;
> > + struct bnx2i_hba *hba;
> > +
> > + shost = cmd->scsi_cmd->device->host;
> > + sess = iscsi_hostdata(shost->hostdata);
> > + BUG_ON(shost != sess->host);
> > +
> > + if (sess && (is_sess_active(sess)))
> > + hba = sess->hba;
> > + else
> > + return FAILED;
> > +
> > + if (test_bit(ADAPTER_STATE_LINK_DOWN, &hba->adapter_state))
> > + return FAILED;
> > +
> > + bnx2i_setup_ictx_dump(hba, conn);
> > +
> > + if (cmd->scsi_cmd != sc)
> > + /* command already completed to scsi mid-layer */
> > + goto cmd_not_active;
> > +
> > + tmf_cmd = bnx2i_alloc_cmd(sess);
> > + if (tmf_cmd == NULL)
> > + goto lack_of_resc;
> > +
> > + tmf_cmd->conn = conn = sess->lead_conn;
> > + tmf_cmd->scsi_cmd = NULL;
> > + tmf_cmd->iscsi_opcode = ISCSI_OPCODE_TMF_REQUEST;
> > + tmf_cmd->req.cmd_sn = sess->cmdsn;
> > + tmf_cmd->tmf_ref_itt = cmd->req.itt;
> > + tmf_cmd->tmf_ref_cmd = cmd;
> > + tmf_cmd->tmf_ref_sc = cmd->scsi_cmd;
> > + cmd->cmd_state = ISCSI_CMD_STATE_ABORT_PEND;
> > + tmf_cmd->cmd_state = ISCSI_CMD_STATE_INITIATED;
> > +
> > +#define BNX2I_TMF_TIMEOUT 10 * HZ
> > + sess->abort_timer.expires = BNX2I_TMF_TIMEOUT + jiffies;
> > + sess->abort_timer.function = bnx2i_iscsi_tmf_timer;
> > + sess->abort_timer.data = (unsigned long)tmf_cmd;
> > + add_timer(&sess->abort_timer);
> > +
> > + bnx2i_send_iscsi_tmf(conn, tmf_cmd);
> > +
> > + /* update iSCSI context for this conn, wait for CNIC to complete */
> > + wait_event_interruptible(sess->er_wait, (!conn->ep ||
> > + (tmf_cmd->cmd_state ==
> > + ISCSI_CMD_STATE_FAILED) ||
> > + (tmf_cmd->cmd_state ==
> > + ISCSI_CMD_STATE_COMPLETED)));
> > +
> > + if (signal_pending(current))
> > + flush_signals(current);
> > +
> > + del_timer_sync(&sess->abort_timer);
> > +
> > + if (tmf_cmd->cmd_state == ISCSI_CMD_STATE_TMF_TIMEOUT) {
> > + printk(KERN_ALERT "abort: abort failed, cmd 0x%p\n", tmf_cmd);
> > + /* TMF timed out, return error status and let SCSI-ML do
> > + * session recovery.
> > + */
> > + list_del_init(&tmf_cmd->link);
> > + bnx2i_free_cmd(sess, tmf_cmd);
> > + return SUCCESS;
> > + }
> > +
> > + list_del_init(&tmf_cmd->link);
> > + bnx2i_free_cmd(sess, tmf_cmd);
> > +
> > + if ((cmd->scsi_cmd->result & 0xFF0000) == (DID_ABORT << 16)) {
> > + cmd->cmd_state = ISCSI_CMD_STATE_CLEANUP_PEND;
> > + bnx2i_send_cmd_cleanup_req(hba, cmd);
> > + wait_event_interruptible_timeout(sess->er_wait,
> > + (cmd->cmd_state ==
> > + ISCSI_CMD_STATE_CLEANUP_CMPL),
> > + msecs_to_jiffies(
> > + ISCSI_CMD_CLEANUP_TIMEOUT));
> > +
> > + if (signal_pending(current))
> > + flush_signals(current);
> > + } else
> > + cmd->scsi_cmd->result = (DID_ABORT << 16);
> > +
> > + cmd->conn->sess->num_active_cmds--;
> > + list_del_init(&cmd->link);
> > + bnx2i_return_failed_command(sess, cmd, DID_ABORT);
> > + bnx2i_free_cmd(sess, cmd);
> > +
> > +cmd_not_active:
> > + return SUCCESS;
> > +
> > +lack_of_resc:
> > + return FAILED;
> > +}
> > +
> > +
> > +/*
> > + * SCSI abort request handler.
> > + */
> > +int bnx2i_abort(struct scsi_cmnd *sc)
> > +{
> > + int reason;
> > + struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) sc->SCp.ptr;
> > +
> > + if (unlikely(!cmd)) {
> > + /* command already completed to scsi mid-layer */
> > + printk(KERN_INFO "bnx2i_abort: sc 0x%p, not active\n", sc);
> > + return SUCCESS;
> > + }
> > +
> > + reason = bnx2i_initiate_abort_cmd(sc);
> > + return reason;
> > +}
> > +
> > +
> > +
> > +/*
> > + * hardware reset
> > + */
> > +int bnx2i_reset(struct scsi_cmnd *sc)
> > +{
> > + return 0;
> > +}
>
>
> So what is up with this one? It seems like if there is a way to reset
> hardware then you would want it as the scsi eh host reset callout
> instead of dropping the session. We could add some transport level
> recovery callouts for the iscsi specifics.
We may not be able to support HBA cold reset as bnx2 driver is the
primary owner of chip reset and initialization. This is the drawback of
sharing network interface with the NIC driver. If there is a need for
administrator to reset the iSCSI port same can be achieved by running
'ifdown eth#' and 'ifup eth#'.
Current driver even allows ethernet interface reset when there are
active iSCSI connection, all active iscsi sessions will be reinstated
when the network link comes back live
>
>
> > +
> > +
> > +void bnx2i_return_failed_command(struct bnx2i_sess *sess,
> > + struct bnx2i_cmd *cmd, int err_code)
> > +{
> > + struct scsi_cmnd *sc = cmd->scsi_cmd;
> > + sc->result = err_code << 16;
> > + sc->resid = cmd->scsi_cmd->request_bufflen;
> > + cmd->scsi_cmd = NULL;
> > + sess->num_active_cmds--;
> > + sc->SCp.ptr = NULL;
> > + sc->scsi_done(sc);
> > +}
> > +
> > +
> > +
> > +/*
> > + * SCSI host reset handler - iSCSI session recovery
> > + */
> > +int bnx2i_host_reset(struct scsi_cmnd *sc)
> > +{
> > + struct Scsi_Host *shost;
> > + struct bnx2i_sess *sess;
> > + int rc = 0;
> > +
> > + shost = sc->device->host;
> > + sess = iscsi_hostdata(shost->hostdata);
> > + printk(KERN_INFO "bnx2i: attempting to reset host, #%d\n",
> > + sess->host->host_no);
> > +
> > + BUG_ON(shost != sess->host);
> > + rc = bnx2i_do_iscsi_sess_recovery(sess, DID_RESET);
> > +
> > + return rc;
> > +}
> > +
> > +
> > +
> > +/**********************************************************************
> > + * open-iscsi interface
> > + **********************************************************************/
> > +
> > +
> > +#define get_bnx2_device(_hba, _devc) do { \
> > + if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5706) || \
> > + (_hba->pci_did == PCI_DEVICE_ID_NX2_5706S)) { \
> > + _devc = '6'; \
> > + } else if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5708) || \
> > + (_hba->pci_did == PCI_DEVICE_ID_NX2_5708S)) { \
> > + _devc = '8'; \
> > + } else if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5709) || \
> > + (_hba->pci_did == PCI_DEVICE_ID_NX2_5709S)) { \
> > + _devc = '9'; \
> > + } \
> > + } while (0)
> > +
> > +/* from open-iscsi project */
> > +/*
> > + * iSCSI Session's hostdata organization:
> > + *
> > + * *------------------* <== hostdata_session(host->hostdata)
> > + * | ptr to class sess|
> > + * |------------------| <== iscsi_hostdata(host->hostdata)
> > + * | iscsi_session |
> > + * *------------------*
> > + */
>
>
>
> I think it is going to be nicer to not allocate the host with the
> session for bnx2i like was done for qla4xxx. If you are going to
> allocate the host with the session then it seems like you should be able
> to intergrate into libiscsi like was done with iscsi_tcp and ib_iser.
> And even if you do not allocate a host per session it seems like you
> should be able to better integreate with libiscsi like was done with
> iscsi_tcp and ib_iser. Right now, it is sort of a mix of qla4xxx's model
> and iscsi_tcp/ib_iser's model which is not going to work well in the
> long run.
Either way will work for us, will investigate more on the code changes
required in bnx2i
> You can change libiscsi so that you do not run in the host's workq and
> so data structs are better optimized for you and whatever else is
> needed. I mean feel free to change that code so it better suits your
> model. Your code is probably closer to ib_iser since that module does
> not have to handle operations like responding to r2ts and has to make
> its own connections.
Right.
>
> There seem to be some bugs from when you might have modeled bnx2i's
> callouts after a older iscsi_tcp or ib_iser. Those bugs got fixed for
> both modules for free when libiscsi got fixed.
>
>
> > +
> > +#define hostdata_privsize(_sz) (sizeof(unsigned long) + _sz + \
> > + _sz % sizeof(unsigned long))
> > +
> > +#define hostdata_session(_hostdata) (iscsi_ptr(*(unsigned long *)_hostdata))
> > +
> > +#define session_to_cls(_sess) hostdata_session(_sess->host->hostdata)
> > +
> > +
> > +
> > +
> > +/*
> > + * Function: bnx2i_register_xport
> > + * Description: this routine will allocate memory for SCSI host template,
> > + * iSCSI template and registers one instance of NX2 device with
> > + * iSCSI Transport Kernel module.
> > + */
> > +int bnx2i_register_xport(struct bnx2i_hba *hba)
> > +{
> > + void *mem_ptr;
> > + char dev_id = '8';
> > +
> > + get_bnx2_device(hba, dev_id);
> > +
> > + mem_ptr = kmalloc(sizeof(struct scsi_host_template), GFP_KERNEL);
> > + hba->scsi_template = mem_ptr;
> > + if (hba->scsi_template == NULL) {
> > + printk(KERN_ALERT "bnx2i: failed to alloc memory for sht\n");
> > + return -ENOMEM;
> > + }
> > +
> > + mem_ptr = kmalloc(sizeof(struct iscsi_transport), GFP_KERNEL);
> > + hba->iscsi_transport = mem_ptr;
> > + if (hba->iscsi_transport == NULL) {
> > + printk(KERN_ALERT "mem error for iscsi_transport template\n");
> > + goto iscsi_xport_err;
> > + }
> > +
> > + mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> > + if (mem_ptr == NULL) {
> > + printk(KERN_ALERT "failed to alloc memory for xport name\n");
> > + goto scsi_name_mem_err;
> > + }
> > +
> > + memcpy(hba->scsi_template, (const void *) &bnx2i_host_template,
> > + sizeof(struct scsi_host_template));
> > + hba->scsi_template->name = mem_ptr;
> > + memcpy((void *) hba->scsi_template->name,
> > + (const void *) bnx2i_host_template.name,
> > + strlen(bnx2i_host_template.name) + 1);
> > +
> > + mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> > + if (mem_ptr == NULL) {
> > + printk(KERN_ALERT "failed to alloc proc name mem\n");
> > + goto scsi_proc_name_mem_err;
> > + }
> > + hba->scsi_template->proc_name = mem_ptr;
> > +
> > + memcpy((void *) hba->iscsi_transport,
> > + (const void *) &bnx2i_iscsi_transport,
> > + sizeof(struct iscsi_transport));
> > +
> > + hba->iscsi_transport->host_template = hba->scsi_template;
> > +
> > + mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> > + if (mem_ptr == NULL) {
> > + printk(KERN_ALERT "mem alloc error, iscsi xport name\n");
> > + goto xport_name_mem_err;
> > + }
> > + hba->iscsi_transport->name = mem_ptr;
> > + sprintf(mem_ptr, "%s%c-%.2x%.2x%.2x", BRCM_ISCSI_XPORT_NAME_PREFIX,
> > + dev_id, (u8)hba->pci_dev->bus->number,
> > + hba->pci_devno, (u8)hba->pci_func);
> > +
> > + memcpy((void *) hba->scsi_template->proc_name, (const void *) mem_ptr,
> > + strlen(mem_ptr) + 1);
> > +
> > + hba->shost_template = iscsi_register_transport(hba->iscsi_transport);
> > + if (!hba->shost_template) {
> > + printk(KERN_ALERT "bnx2i: xport reg failed, hba 0x%p\n", hba);
> > + goto failed_registration;
> > + }
>
>
> The transport should be allocated like qla4xxx or iscsi_tcp/ib_iser.
> Userspace then sets everything up based on usersettings like with
> qla4xxx and iscsi_tcp and ib_iser.
>
> I started that patch that I sent to you guys to modify
> scsi_transport_iscsi abd libiscsi for you guys, but have not finished it
> yet. I am trying to finish merging Olafs iscsi_tcp patches. Feel free to
> change iscsid, scsi_transport_iscsi and libiscsi for me.
>
>
> > +}
> > +
> > +int bnx2i_sysfs_setup(void)
> > +{
> > + int ret;
> > + ret = class_register(&bnx2i_class);
> > +
> > + bnx2i_register_port_class_dev(&port_class_dev);
> > + return ret;
> > +}
> > +
> > +void bnx2i_sysfs_cleanup(void)
> > +{
> > + class_device_unregister(&port_class_dev);
> > + class_unregister(&bnx2i_class);
> > +}
>
> The sysfs bits related to the hba should be use one of the scsi sysfs
> facilities or if they are related to iscsi bits and are generic then
> through the iscsi hba
bnx2i needs 2 sysfs entries -
1. QP size info - this is used to size per connection shared data
structures to issue work requests to chip (login, scsi cmd, tmf, nopin)
and get completions from the chip (scsi completions, async messages,
etc'). This is a iSCSI HBA attribute
2. port mapper - we can be more flexible on classifying this as either
iSCSI HBA attribute or bnx2i driver global attribute
Can hooks be added to iSCSI transport class to include these?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists