linux-kernel - Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8a378429-ab75-0ebc-a852-24c9d3bb9298@yadro.com>
Date:   Sun, 18 Sep 2022 19:32:51 +0300
From:   Evgeny Shatokhin <e.shatokhin@...ro.com>
To:     Jassi Brar <jassisinghbrar@...il.com>
CC:     <linux-kernel@...r.kernel.org>, Ilya Kuznetsov <ilya@...ro.com>,
        <linux@...ro.com>
Subject: Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has
 failed to submit the message

Thank you for a quick reply!

On 16.09.2022 20:04, Jassi Brar wrote:
> On Thu, Sep 15, 2022 at 11:50 AM Evgenii Shatokhin
> <e.shatokhin@...ro.com> wrote:
>>
>> mbox_send_message() places the pointer to the message to the queue
>> (add_to_rbuf) then calls msg_submit(chan) to submit the first of the
>> queued messaged to the mailbox. Some of mailbox drivers can return
>> errors from their .send_data() callbacks, e.g., if the message is
>> invalid or there is something wrong with the mailbox device.
>>
> The message can't be invalid because the client code is written for a
> particular provider.

As of mainline kernel v6.0-rc5, there are mailbox controller drivers 
which check if the messages are valid in their .send_data() callbacks. 
Example:

drivers/mailbox/rockchip-mailbox.c, rockchip_mbox_send_data():
	if (msg->rx_size > mb->buf_size) {
		dev_err(mb->mbox.dev, "Transmit size over buf size(%d)\n",
			mb->buf_size);
		return -EINVAL;
	}

Other examples are zynqmp_ipi_send_data() from 
drivers/mailbox/zynqmp-ipi-mailbox.c, ti_msgmgr_send_data() from 
drivers/mailbox/ti-msgmgr.c, etc.

If this is incorrect and the controller drivers should not do such 
things, I'd suggest to clearly state it in the docs, because it is far 
from obvious from Documentation/driver-api/mailbox.rst at the moment.

That is, one could state that checking if the messages to be transmitted 
are valid is a responsibility of the callers of mailbox API rather than 
of the controller driver.

I could prepare such patch for the docs. Objections?

> 
> Though it is possible for the mailbox controller to break down for
> some reason. In that case, the blocking api will keep retrying until
> successful. 

As far as I can see from the code, the behaviour seems to be different.

mbox_send_message() calls msg_submit() to send the message the first 
time. If that fails, hrtimer is not armed, so there will be no attempts 
to send the message again till tx_out ms pass:

	err = chan->mbox->ops->send_data(chan, data);
	if (!err) {
		chan->active_req = data;
		chan->msg_count--;
	}
exit:
	spin_unlock_irqrestore(&chan->lock, flags);

	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
		/* kick start the timer immediately to avoid delays */
		spin_lock_irqsave(&chan->mbox->poll_hrt_lock, flags);
		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
		spin_unlock_irqrestore(&chan->mbox->poll_hrt_lock, flags);
	}

This is from msg_submit(). Thus, the hrtimer will not fire, tx_tick() 
will not be called until tx_out ms have passed, and no attempts to send 
the message again will be made here.

In addition, complete(&chan->tx_complete) will not be called, so, 
mbox_send_message() will have to needlessly wait whole tx_out ms.
Only after that, it will call tx_tick(chan, -ETIME), which will, in 
turn, call msg_submit() to try to send the message again. If the mbox 
has not recovered, sending will fail again.

Then, mbox_send_message() will exit with -ETIME. The pointer to the 
message will remain in chan->msg_data[], but the framework will not 
attempt to send it again until the client calls mbox_send_message() for 
another, possibly unrelated message.

In this case, to sum up, mbox_send_message():
* needlessly waits for tx_out ms;
* only tries to send the message twice rather than makes retries until 
successful;
* does not inform the client about the actual error happened, just 
returns -ETIME;
* keeps the pointer to the message in chan->msg_data[], which is too 
easy to overlook on the client side. Too easy for the client to, say, 
reuse the structure and cause trouble.

What I suggest is to leave it to the client (or some other 
provider-specific code using the client) what to do with the failures.

If the error is reported by the controller driver, don't wait in 
mbox_send_message(), just pass the error to the client and exit. If the 
client decides to ignore the error - OK, its problem. Or - it may kick 
the mbox device somehow in a provider-specific way to make it work, or - 
reset the channel, or - do anything else to make things work again.

The behaviour of mbox_send_message() would then become more consistent: 
either it has sent the message successfully or it failed and returned an 
error, without side-effects (like the pointer to that message kept in 
the internal buffer).

I do not think this change would break the existing controller drivers 
and client drivers.

What do you think?

>But ideally the client, upon getting -ETIME, should free()
> and request() the channel reset it (because controller drivers usually
> don't contain the logic to automatically reset upon some error).
> 
> thanks.

Regards,
Evgenii