[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABb+yY1BNsdMq7CNOBDk3sn7uvpL4=-fT7eOcbuL-+Yjz+iqHw@mail.gmail.com>
Date: Tue, 19 Apr 2022 09:10:51 -0500
From: Jassi Brar <jassisinghbrar@...il.com>
To: Bjorn Ardo <bjorn.ardo@...s.com>
Cc: kernel <kernel@...s.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mailbox: forward the hrtimer if not queued and under a lock
On Tue, Apr 19, 2022 at 7:15 AM Bjorn Ardo <bjorn.ardo@...s.com> wrote:
>
> Hi,
>
>
> I can confirm that this is an actual issue found on our system, not just
> a theoretical case.
>
>
> If I add the following trace-code to the original code:
>
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 3e7d4b20ab34..8e9e82e5f4b1 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -57,6 +57,7 @@ static void msg_submit(struct mbox_chan *chan)
> void *data;
> int err = -EBUSY;
>
> + trace_printk("Entering msg_submit\n");
> spin_lock_irqsave(&chan->lock, flags);
>
> if (!chan->msg_count || chan->active_req)
> @@ -85,9 +86,14 @@ static void msg_submit(struct mbox_chan *chan)
> /* kick start the timer immediately to avoid delays */
> if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> /* but only if not already active */
> - if (!hrtimer_active(&chan->mbox->poll_hrt))
> + if (!hrtimer_active(&chan->mbox->poll_hrt)) {
> + trace_printk("Starting the hr timer from
> submit\n");
> hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> + } else {
> + trace_printk("Not starting the hr timer from
> submit since it is active\n");
> + }
> }
> + trace_printk("Leaving msg_submit\n");
> }
>
> static void tx_tick(struct mbox_chan *chan, int r)
> @@ -121,6 +127,7 @@ static enum hrtimer_restart txdone_hrtimer(struct
> hrtimer *hrtimer)
> bool txdone, resched = false;
> int i;
>
> + trace_printk("Entering txdone_hrtimer\n");
> for (i = 0; i < mbox->num_chans; i++) {
> struct mbox_chan *chan = &mbox->chans[i];
>
> @@ -134,8 +141,10 @@ static enum hrtimer_restart txdone_hrtimer(struct
> hrtimer *hrtimer)
>
> if (resched) {
> hrtimer_forward_now(hrtimer,
> ms_to_ktime(mbox->txpoll_period));
> + trace_printk("Leaving txdone_hrtimer with restart\n");
> return HRTIMER_RESTART;
> }
> + trace_printk("Leaving txdone_hrtimer without restart\n");
> return HRTIMER_NORESTART;
> }
>
> Then I get the following trace output (I have cropped a small portion
> around where the error appears):
>
>
> vhost-475-480 [000] d..1. 217.440325: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d..1. 217.440332: msg_submit: Starting
> the hr timer from submit
> vhost-475-480 [000] d..1. 217.440336: msg_submit: Leaving
> msg_submit
> vhost-475-480 [000] d.h1. 217.440342: txdone_hrtimer:
> Entering txdone_hrtimer
> vhost-475-480 [000] d.h1. 217.440349: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
> vhost-475-480 [000] d..1. 217.440597: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d..1. 217.440599: msg_submit: Starting
> the hr timer from submit
> vhost-475-480 [000] d..1. 217.440602: msg_submit: Leaving
> msg_submit
> <idle>-0 [001] ..s1. 217.440604: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d.h1. 217.440606: txdone_hrtimer:
> Entering txdone_hrtimer
> vhost-475-480 [000] d.h1. 217.440608: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
> <idle>-0 [001] ..s1. 217.440609: msg_submit: Not
> starting the hr timer from submit since it is active
> <idle>-0 [001] ..s1. 217.440610: msg_submit: Leaving
> msg_submit
>
>
> If I break down the log above we first have one case that works as
> intended. That is a message being written and a timer started and the
> message have been read when the timer hits:
>
> vhost-475-480 [000] d..1. 217.440325: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d..1. 217.440332: msg_submit: Starting
> the hr timer from submit
> vhost-475-480 [000] d..1. 217.440336: msg_submit: Leaving
> msg_submit
> vhost-475-480 [000] d.h1. 217.440342: txdone_hrtimer:
> Entering txdone_hrtimer
> vhost-475-480 [000] d.h1. 217.440349: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
>
>
> After this we write a new message and a new timer is started:
>
> vhost-475-480 [000] d..1. 217.440597: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d..1. 217.440599: msg_submit: Starting
> the hr timer from submit
> vhost-475-480 [000] d..1. 217.440602: msg_submit: Leaving
> msg_submit
>
>
> However here comes the race. Now a new message is being written at the
> same time as the hr-timer is handling the first reply (on different CPU's):
>
> <idle>-0 [001] ..s1. 217.440604: msg_submit: Entering
> msg_submit
> vhost-475-480 [000] d.h1. 217.440606: txdone_hrtimer:
> Entering txdone_hrtimer
>
I don't have access to your client driver, but if it submits another
message from rx_callback() that is the problem.
Please have a look at how other platforms do, for example
imx_dsp_rproc_rx_tx_callback()
/**
* mbox_chan_received_data - A way for controller driver to push data
* received from remote to the upper layer.
* @chan: Pointer to the mailbox channel on which RX happened.
* @mssg: Client specific message typecasted as void *
*
* After startup and before shutdown any data received on the chan
* is passed on to the API via atomic mbox_chan_received_data().
* The controller should ACK the RX only after this call returns.
*/
void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
If not this, please share your client code as well.
thanks.
Powered by blists - more mailing lists