linux-kernel - Re: [PATCH] mailbox: forward the hrtimer if not queued and under a lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABb+yY1BNsdMq7CNOBDk3sn7uvpL4=-fT7eOcbuL-+Yjz+iqHw@mail.gmail.com>
Date:   Tue, 19 Apr 2022 09:10:51 -0500
From:   Jassi Brar <jassisinghbrar@...il.com>
To:     Bjorn Ardo <bjorn.ardo@...s.com>
Cc:     kernel <kernel@...s.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mailbox: forward the hrtimer if not queued and under a lock

On Tue, Apr 19, 2022 at 7:15 AM Bjorn Ardo <bjorn.ardo@...s.com> wrote:
>
> Hi,
>
>
> I can confirm that this is an actual issue found on our system, not just
> a theoretical case.
>
>
> If I add the following trace-code to the original code:
>
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 3e7d4b20ab34..8e9e82e5f4b1 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -57,6 +57,7 @@ static void msg_submit(struct mbox_chan *chan)
>          void *data;
>          int err = -EBUSY;
>
> +       trace_printk("Entering msg_submit\n");
>          spin_lock_irqsave(&chan->lock, flags);
>
>          if (!chan->msg_count || chan->active_req)
> @@ -85,9 +86,14 @@ static void msg_submit(struct mbox_chan *chan)
>          /* kick start the timer immediately to avoid delays */
>          if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>                  /* but only if not already active */
> -               if (!hrtimer_active(&chan->mbox->poll_hrt))
> +               if (!hrtimer_active(&chan->mbox->poll_hrt)) {
> +                       trace_printk("Starting the hr timer from
> submit\n");
> hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> +               } else {
> +                       trace_printk("Not starting the hr timer from
> submit since it is active\n");
> +               }
>          }
> +       trace_printk("Leaving msg_submit\n");
>   }
>
>   static void tx_tick(struct mbox_chan *chan, int r)
> @@ -121,6 +127,7 @@ static enum hrtimer_restart txdone_hrtimer(struct
> hrtimer *hrtimer)
>          bool txdone, resched = false;
>          int i;
>
> +       trace_printk("Entering txdone_hrtimer\n");
>          for (i = 0; i < mbox->num_chans; i++) {
>                  struct mbox_chan *chan = &mbox->chans[i];
>
> @@ -134,8 +141,10 @@ static enum hrtimer_restart txdone_hrtimer(struct
> hrtimer *hrtimer)
>
>          if (resched) {
>                  hrtimer_forward_now(hrtimer,
> ms_to_ktime(mbox->txpoll_period));
> +               trace_printk("Leaving txdone_hrtimer with restart\n");
>                  return HRTIMER_RESTART;
>          }
> +       trace_printk("Leaving txdone_hrtimer without restart\n");
>          return HRTIMER_NORESTART;
>   }
>
> Then I get the following trace output (I have cropped a small portion
> around where the error appears):
>
>
>         vhost-475-480     [000] d..1.   217.440325: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d..1.   217.440332: msg_submit: Starting
> the hr timer from submit
>         vhost-475-480     [000] d..1.   217.440336: msg_submit: Leaving
> msg_submit
>         vhost-475-480     [000] d.h1.   217.440342: txdone_hrtimer:
> Entering txdone_hrtimer
>         vhost-475-480     [000] d.h1.   217.440349: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
>         vhost-475-480     [000] d..1.   217.440597: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d..1.   217.440599: msg_submit: Starting
> the hr timer from submit
>         vhost-475-480     [000] d..1.   217.440602: msg_submit: Leaving
> msg_submit
>            <idle>-0       [001] ..s1.   217.440604: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d.h1.   217.440606: txdone_hrtimer:
> Entering txdone_hrtimer
>         vhost-475-480     [000] d.h1.   217.440608: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
>            <idle>-0       [001] ..s1.   217.440609: msg_submit: Not
> starting the hr timer from submit since it is active
>            <idle>-0       [001] ..s1.   217.440610: msg_submit: Leaving
> msg_submit
>
>
> If I break down the log above we first have one case that works as
> intended. That is a message being written and a timer started and the
> message have been read when the timer hits:
>
>         vhost-475-480     [000] d..1.   217.440325: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d..1.   217.440332: msg_submit: Starting
> the hr timer from submit
>         vhost-475-480     [000] d..1.   217.440336: msg_submit: Leaving
> msg_submit
>         vhost-475-480     [000] d.h1.   217.440342: txdone_hrtimer:
> Entering txdone_hrtimer
>         vhost-475-480     [000] d.h1.   217.440349: txdone_hrtimer:
> Leaving txdone_hrtimer without restart
>
>
> After this we write a new message and a new timer is started:
>
>         vhost-475-480     [000] d..1.   217.440597: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d..1.   217.440599: msg_submit: Starting
> the hr timer from submit
>         vhost-475-480     [000] d..1.   217.440602: msg_submit: Leaving
> msg_submit
>
>
> However here comes the race. Now a new message is being written at the
> same time as the hr-timer is handling the first reply (on different CPU's):
>
>            <idle>-0       [001] ..s1.   217.440604: msg_submit: Entering
> msg_submit
>         vhost-475-480     [000] d.h1.   217.440606: txdone_hrtimer:
> Entering txdone_hrtimer
>
I don't have access to your client driver, but if it submits another
message from rx_callback() that is the problem.

Please have a look at how other platforms do, for example
imx_dsp_rproc_rx_tx_callback()

/**
 * mbox_chan_received_data - A way for controller driver to push data
 *              received from remote to the upper layer.
 * @chan: Pointer to the mailbox channel on which RX happened.
 * @mssg: Client specific message typecasted as void *
 *
 * After startup and before shutdown any data received on the chan
 * is passed on to the API via atomic mbox_chan_received_data().
 * The controller should ACK the RX only after this call returns.
 */
void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)

If not this, please share your client code as well.

thanks.