linux-hardening - Re: [PATCH v2 05/10] mailbox: add MediaTek GPUEB IPI mailbox

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5236533.GXAFRqVoOG@workhorse>
Date: Tue, 16 Sep 2025 11:03:57 +0200
From: Nicolas Frattaroli <nicolas.frattaroli@...labora.com>
To: Chia-I Wu <olvaffe@...il.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
 Boris Brezillon <boris.brezillon@...labora.com>,
 Steven Price <steven.price@....com>, Liviu Dudau <liviu.dudau@....com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Rob Herring <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
 Conor Dooley <conor+dt@...nel.org>,
 Matthias Brugger <matthias.bgg@...il.com>,
 MyungJoo Ham <myungjoo.ham@...sung.com>,
 Kyungmin Park <kyungmin.park@...sung.com>,
 Chanwoo Choi <cw00.choi@...sung.com>, Jassi Brar <jassisinghbrar@...il.com>,
 Kees Cook <kees@...nel.org>, "Gustavo A. R. Silva" <gustavoars@...nel.org>,
 Chen-Yu Tsai <wenst@...omium.org>, kernel@...labora.com,
 dri-devel@...ts.freedesktop.org, devicetree@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-mediatek@...ts.infradead.org, linux-pm@...r.kernel.org,
 linux-hardening@...r.kernel.org
Subject: Re: [PATCH v2 05/10] mailbox: add MediaTek GPUEB IPI mailbox

On Tuesday, 16 September 2025 06:55:30 Central European Summer Time Chia-I Wu wrote:
> On Mon, Sep 15, 2025 at 6:34 AM Nicolas Frattaroli
> <nicolas.frattaroli@...labora.com> wrote:
> >
> > On Saturday, 13 September 2025 00:11:10 Central European Summer Time Chia-I Wu wrote:
> > > On Fri, Sep 12, 2025 at 11:38 AM Nicolas Frattaroli
> > > <nicolas.frattaroli@...labora.com> wrote:
> > > <snipped>
> > > > +static irqreturn_t mtk_gpueb_mbox_thread(int irq, void *data)
> > > > +{
> > > > +       struct mtk_gpueb_mbox_chan *ch = data;
> > > > +       int status;
> > > > +
> > > > +       status = atomic_cmpxchg(&ch->rx_status,
> > > > +                               MBOX_FULL | MBOX_CLOGGED, MBOX_FULL);
> > > > +       if (status == (MBOX_FULL | MBOX_CLOGGED)) {
> > > > +               mtk_gpueb_mbox_read_rx(ch);
> > > > +               writel(BIT(ch->num), ch->ebm->mbox_ctl + MBOX_CTL_IRQ_CLR);
> > > > +               mbox_chan_received_data(&ch->ebm->mbox.chans[ch->num],
> > > > +                                       ch->rx_buf);
> > > Given what other drivers do, and how mtk_mfg consumes the data, we should
> > >
> > >   char buf[MAX_OF_RX_LEN]; //  MAX_OF_RX_LEN is 32; we can also
> > > allocate it during probe
> > >   mtk_gpueb_mbox_read_rx(ch);
> > >   mbox_chan_received_data(..., buf);
> > >
> > > mtx_mfg makes a copy eventually anyway.
> >
> > We don't right now, at least not until after the callback returns.
> > So we need to have the copy in the mtk_mfg callback, not after the
> > completion. That's fine and I do want to do this as this is what
> > the mailbox framework seems to expect clients to do.
> >
> > > We don't need to maintain any
> > > extra copy.
> > >
> > > Then we might not need rx_status.
> >
> > We can probably get rid of it if we keep the per-channel
> > interrupt handler. Otherwise, we may still need clogged,
> > as we don't want to process interrupts on channels we have
> > no user for.
> >
> > >
> > > > +               atomic_set(&ch->rx_status, 0);
> > > > +               return IRQ_HANDLED;
> > > > +       }
> > > > +
> > > > +       return IRQ_NONE;
> > > > +}
> > > > +
> > > > +static int mtk_gpueb_mbox_send_data(struct mbox_chan *chan, void *data)
> > > > +{
> > > > +       struct mtk_gpueb_mbox_chan *ch = chan->con_priv;
> > > > +       int i;
> > > > +       u32 *values = data;
> > > > +
> > > > +       if (atomic_read(&ch->rx_status))
> > > > +               return -EBUSY;
> > > > +
> > > > +       /*
> > > > +        * We don't want any fancy nonsense, just write the 32-bit values in
> > > > +        * order. memcpy_toio/__iowrite32_copy don't work here, because fancy.
> > > > +        */
> > > > +       for (i = 0; i < ch->c->tx_len; i += 4)
> > > > +               writel(values[i / 4], ch->ebm->mbox_mmio + ch->c->tx_offset + i);
> > > > +
> > > > +       writel(BIT(ch->num), ch->ebm->mbox_ctl + MBOX_CTL_IRQ_SET);
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int mtk_gpueb_mbox_startup(struct mbox_chan *chan)
> > > > +{
> > > > +       struct mtk_gpueb_mbox_chan *ch = chan->con_priv;
> > > > +       int ret;
> > > > +
> > > > +       atomic_set(&ch->rx_status, 0);
> > > > +
> > > > +       ret = clk_enable(ch->ebm->clk);
> > > > +       if (ret) {
> > > > +               dev_err(ch->ebm->dev, "Failed to enable EB clock: %pe\n",
> > > > +                       ERR_PTR(ret));
> > > > +               goto err_clog;
> > > > +       }
> > > > +
> > > > +       writel(BIT(ch->num), ch->ebm->mbox_ctl + MBOX_CTL_IRQ_CLR);
> > > > +
> > > > +       ret = devm_request_threaded_irq(ch->ebm->dev, ch->ebm->irq, mtk_gpueb_mbox_isr,
> > > > +                                       mtk_gpueb_mbox_thread, IRQF_SHARED | IRQF_ONESHOT,
> > > > +                                       ch->full_name, ch);
> > > I don't think this warrants a per-channel irq thread.
> > >
> > > mbox_chan_received_data is atomic. I think wecan start simple with
> > > just a devm_request_irq for all channels. mtk_gpueb_mbox_isr can
> > >
> > >   read bits from MBOX_CTL_RX_STS
> > >   for each bit set:
> > >     read data from rx
> > >     mbox_chan_received_data
> > >   write bits to MBOX_CTL_IRQ_CLR
> > >
> >
> > I don't like this approach. It brings us back to having to process
> > multiple channels per ISR, keep track of when the interrupt should
> > be enabled and disabled based on how many channels are in use, and
> > also is not in line with what e.g. omap-mailbox.c does.
> >
> > Remember that `mbox_chan_received_data` synchronously calls the
> > mailbox client's rx_callback. In mediatek_mfg's case, this is
> > fairly small, though with the request to not make the rx buffer
> > persist beyond the rx_callback it will gain an additional memory
> > copy. But we can't guarantee that someone isn't going to put a
> > slow operation in the path. Sure, it's going to be atomic, but
> > waiting for a spinlock is atomic and not something an ISR would
> > enjoy. I don't think mailbox clients would expect that if they
> > take their time they'll stall the interrupt handler for every
> > other channel.
> >
> > So we'd keep the interrupt disabled for all channels until the
> > client that received a message has processed it.
> >
> > I can see myself getting rid of the handler and just having the
> > thread function as the bottom half, but I'd really like to keep
> > the one-IRQ-request-per-channel thing I've got going now as it
> > made the code a lot easier to reason about. However, doing this
> > would mean the interrupt is re-enabled after the generic upper
> > half, when all the business logic that needs to not run
> > concurrently for an individual channel is in the bottom half.
> >
> > As far as I can tell, this would then mean we'd have to add
> > some concurrency exclusion mechanism to the bottom half.
> >
> > Moving all the logic into the upper half handler function
> > would make that handler somewhat longer, and I don't know
> > if IRQF_ONESHOT masks the interrupt for all users of that
> > IRQ number or just for those with that dev_id. If it's per
> > dev_id, then I'm fine with moving stuff up there. But from
> > my reading of the core IRQ handling code, that does not
> > appear to be the case; one channel getting a reply would
> > mask *all* channels of the mailbox until the upper half is
> > completed, and if the upper half calls into a driver
> > callback synchronously, that may take a hot minute.
> >
> > Put differently: Is there a problem with one thread per used
> > channel, or are we going off vibes here? The way it currently
> > works uses the shared interrupt to mark just that one channel
> > as busy with rx_status before letting the IRQ for all channels
> > be unmasked again, which seems ideal to me.
> No, one thread per used channel can work. I can't say I like it, but I
> also don't know the hw as well as you do.
> 

Your knowledge is probably not far behind mine on this hardware :(

I'll keep the per-channel thread for v3 for now, so that it's
clearer as to how this will look. It'll also give us both an
opportunity to run the code and add some measurements to see if
this causes any problems, and to experiment with your proposed
solution.

What I mainly am worried about is that if we go back to one IRQ
for all channels, then we have to do our own "how many channels
are enabled?" accounting to disable the IRQ later, because the
enable_irq/disable_irq accounting works the opposite way where
you can disable as many times as you want but your enables can't
exceed disables + 1.

Kind regards,
Nicolas Frattaroli