[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <418b8bbb-84d0-7293-42cd-8c552e4357eb@yadro.com>
Date: Wed, 5 Oct 2022 15:11:12 +0300
From: Evgeny Shatokhin <e.shatokhin@...ro.com>
To: Jassi Brar <jassisinghbrar@...il.com>
CC: <linux-kernel@...r.kernel.org>, Ilya Kuznetsov <ilya@...ro.com>,
<linux@...ro.com>
Subject: Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has
failed to submit the message
Hi,
On 18.09.2022 19:32, Evgeny Shatokhin wrote:
> Thank you for a quick reply!
>
> On 16.09.2022 20:04, Jassi Brar wrote:
>> On Thu, Sep 15, 2022 at 11:50 AM Evgenii Shatokhin
>> <e.shatokhin@...ro.com> wrote:
>>>
>>> mbox_send_message() places the pointer to the message to the queue
>>> (add_to_rbuf) then calls msg_submit(chan) to submit the first of the
>>> queued messaged to the mailbox. Some of mailbox drivers can return
>>> errors from their .send_data() callbacks, e.g., if the message is
>>> invalid or there is something wrong with the mailbox device.
>>>
>> The message can't be invalid because the client code is written for a
>> particular provider.
>
> As of mainline kernel v6.0-rc5, there are mailbox controller drivers
> which check if the messages are valid in their .send_data() callbacks.
> Example:
>
> drivers/mailbox/rockchip-mailbox.c, rockchip_mbox_send_data():
> if (msg->rx_size > mb->buf_size) {
> dev_err(mb->mbox.dev, "Transmit size over buf size(%d)\n",
> mb->buf_size);
> return -EINVAL;
> }
>
> Other examples are zynqmp_ipi_send_data() from
> drivers/mailbox/zynqmp-ipi-mailbox.c, ti_msgmgr_send_data() from
> drivers/mailbox/ti-msgmgr.c, etc.
>
> If this is incorrect and the controller drivers should not do such
> things, I'd suggest to clearly state it in the docs, because it is far
> from obvious from Documentation/driver-api/mailbox.rst at the moment.
>
> That is, one could state that checking if the messages to be transmitted
> are valid is a responsibility of the callers of mailbox API rather than
> of the controller driver.
>
> I could prepare such patch for the docs. Objections?
>
>>
>> Though it is possible for the mailbox controller to break down for
>> some reason. In that case, the blocking api will keep retrying until
>> successful.
>
> As far as I can see from the code, the behaviour seems to be different.
>
> mbox_send_message() calls msg_submit() to send the message the first
> time. If that fails, hrtimer is not armed, so there will be no attempts
> to send the message again till tx_out ms pass:
>
> err = chan->mbox->ops->send_data(chan, data);
> if (!err) {
> chan->active_req = data;
> chan->msg_count--;
> }
> exit:
> spin_unlock_irqrestore(&chan->lock, flags);
>
> if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> /* kick start the timer immediately to avoid delays */
> spin_lock_irqsave(&chan->mbox->poll_hrt_lock, flags);
> hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> spin_unlock_irqrestore(&chan->mbox->poll_hrt_lock, flags);
> }
>
> This is from msg_submit(). Thus, the hrtimer will not fire, tx_tick()
> will not be called until tx_out ms have passed, and no attempts to send
> the message again will be made here.
>
> In addition, complete(&chan->tx_complete) will not be called, so,
> mbox_send_message() will have to needlessly wait whole tx_out ms.
> Only after that, it will call tx_tick(chan, -ETIME), which will, in
> turn, call msg_submit() to try to send the message again. If the mbox
> has not recovered, sending will fail again.
>
> Then, mbox_send_message() will exit with -ETIME. The pointer to the
> message will remain in chan->msg_data[], but the framework will not
> attempt to send it again until the client calls mbox_send_message() for
> another, possibly unrelated message.
>
> In this case, to sum up, mbox_send_message():
> * needlessly waits for tx_out ms;
> * only tries to send the message twice rather than makes retries until
> successful;
> * does not inform the client about the actual error happened, just
> returns -ETIME;
> * keeps the pointer to the message in chan->msg_data[], which is too
> easy to overlook on the client side. Too easy for the client to, say,
> reuse the structure and cause trouble.
>
> What I suggest is to leave it to the client (or some other
> provider-specific code using the client) what to do with the failures.
>
> If the error is reported by the controller driver, don't wait in
> mbox_send_message(), just pass the error to the client and exit. If the
> client decides to ignore the error - OK, its problem. Or - it may kick
> the mbox device somehow in a provider-specific way to make it work, or -
> reset the channel, or - do anything else to make things work again.
>
> The behaviour of mbox_send_message() would then become more consistent:
> either it has sent the message successfully or it failed and returned an
> error, without side-effects (like the pointer to that message kept in
> the internal buffer).
>
> I do not think this change would break the existing controller drivers
> and client drivers.
>
> What do you think?
>
>> But ideally the client, upon getting -ETIME, should free()
>> and request() the channel reset it (because controller drivers usually
>> don't contain the logic to automatically reset upon some error).
>>
>> thanks.
Any updates on this?
Looking forward to your comments.
Regards,
Evgenii
Powered by blists - more mailing lists