linux-kernel - Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yyx3IAcMX309QEjB@e120937-lin>
Date:   Thu, 22 Sep 2022 15:54:24 +0100
From:   Cristian Marussi <cristian.marussi@....com>
To:     Shivnandan Kumar <quic_kshivnan@...cinc.com>
Cc:     sudeep.holla@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, quic_rgottimu@...cinc.com,
        quic_avajid@...cinc.com
Subject: Re: Query regarding "firmware: arm_scmi: Free mailbox channels if
 probe fails"

On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote:
> Hi Christian,
> 

Hi Shivnandan,

> 
> Do you have any update or suggestion regarding thread https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019.
> 
> We are still getting this issue and wanted to check if  there is any fix
> that I can try.
> 

Sorry this issue fell to the bottom of my list in these past months...
... but it stil on TODO :D

So today I tried to get my head around this issue again (i.e. mainly
re-reading the above thread to remind me what was the status and wth I
had written... :P)

In summary the racy thing seemed to be caused by the a delayed late
SCMI Base reply happily served on one core by scmi_rx_callback operating
on some well-defined SCMI channel, while on another core we are effectively
shutting down the system and destroying such channels: now this should
be clearly NOT be possible and it is what we have to synchronize.

Looking at the transport layer that you use, mailbox, I see that while
setup/free helpers are synchronized on an internal chan->lock, the RX
path inside the mailbox core is not, so I tried this:

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4229b9b5da98..bb6173c0ad54 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
  */
 void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
 {
+       unsigned long flags;
+
+       spin_lock_irqsave(&chan->lock, flags);
        /* No buffering the received data */
        if (chan->cl->rx_callback)
                chan->cl->rx_callback(chan->cl, mssg);
+       spin_unlock_irqrestore(&chan->lock, flags);
 }
 EXPORT_SYMBOL_GPL(mbox_chan_received_data);

... can you try on your setup ? I dont have a way to easily reproduce
your race as of now...

NOTE THAT, I am not still convinced that the above fix, if it works, will
constitute the final solution to this issue, I could maybe move this same
kind of sync up into the SCMI transport layer to avoid to impact all other
users of the above mailbox interface (since, as of today, nobody has
reported any issue like ours due to the missing spinlock..)..but it
could be helpful to test the above to verify that this is really where
the root issue is.

Thanks,
Cristian