linux-kernel - Re: [PATCH] firmware: arm_scmi: fix timeout value for send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200610082315.GB2689@bogus>
Date:   Wed, 10 Jun 2020 09:23:15 +0100
From:   Sudeep Holla <sudeep.holla@....com>
To:     jassisinghbrar@...il.com
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        viresh.kumar@...aro.org, robh@...nel.org, frowand.list@...il.com,
        bjorn.andersson@...aro.org, vincent.guittot@...aro.org,
        arnd@...db.de, Jassi Brar <jaswinder.singh@...aro.org>,
        Sudeep Holla <sudeep.holla@....com>
Subject: Re: [PATCH] firmware: arm_scmi: fix timeout value for send_message

On Sun, Jun 07, 2020 at 02:30:23PM -0500, jassisinghbrar@...il.com wrote:
> From: Jassi Brar <jaswinder.singh@...aro.org>
>
> Currently scmi_do_xfer() submits a message to mailbox api and waits
> for an apparently very short time. This works if there are not many
> messages in the queue already. However, if many clients share a
> channel and/or each client submits many messages in a row, the

The recommendation in such scenarios is to use multiple channel.

> timeout value becomes too short and returns error even if the mailbox
> is working fine according to the load. The timeout occurs when the
> message is still in the api/queue awaiting its turn to ride the bus.
>
>  Fix this by increasing the timeout value enough (500ms?) so that it
> fails only if there is an actual problem in the transmission (like a
> lockup or crash).
>
> [If we want to capture a situation when the remote didn't
> respond within expected latency, then the timeout should not
> start here, but from tx_prepare callback ... just before the
> message physically gets on the channel]
>

The bottle neck may not be in the remote. It may be mailbox serialising
the requests even when it can parallelise.

> Signed-off-by: Jassi Brar <jaswinder.singh@...aro.org>
> ---
>  drivers/firmware/arm_scmi/driver.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index dbec767222e9..46ddafe7ffc0 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
>  	}
>
>  	if (xfer->hdr.poll_completion) {
> -		ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> +		ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
>

This is unacceptable delay for schedutil fast_switch. So no for this one.

>  		spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
>
> @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
>  			ret = -ETIMEDOUT;
>  	} else {
>  		/* And we wait for the response. */
> -		timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> +		timeout = msecs_to_jiffies(500);

In general, this hides issues in the remote. We are trying to move towards
tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
big enough. We have it set to 30ms now. 500ms is way too large and not
required IMO.

--
Regards,
Sudeep