[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120304134802.2ed6fbd6@stein>
Date: Sun, 4 Mar 2012 13:48:02 +0100
From: Stefan Richter <stefanr@...6.in-berlin.de>
To: Chris Boot <bootc@...tc.net>
Cc: linux1394-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for
management ORBs
On Feb 18 Chris Boot wrote:
> When sending ORBs the struct sbp2_orb->rcode field should be initialised
> to -1 otherwise complete_transaction() assumes the request is successful
> (RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
> LOGOUT, this was not done and so the initiator would wait for the
> request to time out before trying again.
>
> Without this, LOGINs are only retried when the management ORB times out,
> rather than the initiator noticing an error occurred and retrying soon
> after. For targets that advertise more than one LUN per unit, and can
> only accept one management request at a time, this means LUNs are only
> logged in one per timeout period.
>
> Signed-off-by: Chris Boot <bootc@...tc.net>
> Cc: Stefan Richter <stefanr@...6.in-berlin.de>
> ---
> drivers/firewire/sbp2.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
> index b12c6ba..7776c18 100644
> --- a/drivers/firewire/sbp2.c
> +++ b/drivers/firewire/sbp2.c
> @@ -572,6 +572,9 @@ static int sbp2_send_management_orb(struct sbp2_logical_unit *lu, int node_id,
> if (dma_mapping_error(device->card->device, orb->response_bus))
> goto fail_mapping_response;
>
> + /* Initialize rcode to something not RCODE_COMPLETE. */
> + orb->base.rcode = -1;
> +
> orb->request.response.high = 0;
> orb->request.response.low = cpu_to_be32(orb->response_bus);
>
I left this hanging in my inbox for too long, sorry...
While I agree that the current initialization of orb->base.rcode with 0 is
wrong, I don't think your change alone is sufficient:
Consider the case that a login request to LU 0 causes the target to pull
out the hardware behind that LU out of a powered-down state --- which may
take a very long time --- and login requests to LU 1 would be aborted by
the target with resp_conflict_error on any Management_Agent write
request. Of course a reasonably clever target would accept login before
full power-up, but you never now.
We retry login 5 times in 0.2 seconds intervals, and this 1 s in total may
not be enough.
--------
And a few notes to myself, to be done on top of the above:
a) Turn the magic value -1 into a defined constant. Use that constant
in the two ORB initializers and in the SBP-2 status write handler.
b) The reconnect failure handling seems a bit simplistic. I changed it
from Kristian's original retry loop to a shortcut to re-login in commit
ce896d95cc7886ae05859c5b409a7b2f3b606ec1. While re-login instead of
reconnect during management-agent-busy situations works too, retrying
the reconnect at least during reconnect-hold period may be more robust in
such situations.
So the generation check should be replaced by checks for 1394 transaction
completion with RCODE_CONFLICT_ERROR or RCODE_BUSY, and maybe for some
other types of 1394 transaction failure or SBP-2 transaction failure.
c) Perhaps retry logout in sbp2_remove() if we clearly detect a
management-agent-busy situation.
d) Reduce log spam of the sort of "management write failed" or "failed to
reconnect" in situations where we know that we deal with such failures
correctly.
--
Stefan Richter
-=====-===-- --== --=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists