lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABb+yY0tT-R_KyhbWL-u8vDO9xxpSUmOBKwbZcBT=Z5rV4RxLw@mail.gmail.com>
Date: Mon, 5 Jan 2026 19:56:44 -0600
From: Jassi Brar <jassisinghbrar@...il.com>
To: Doug Anderson <dianders@...omium.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mailbox: Document the behavior of mbox_send_message() w/
 NULL mssg

On Mon, Jan 5, 2026 at 1:00 PM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Sat, Dec 20, 2025 at 3:26 PM Jassi Brar <jassisinghbrar@...il.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 3:40 PM Douglas Anderson <dianders@...omium.org> wrote:
> > >
> > > The way the mailbox core behaves when you pass a NULL `mssg` parameter
> > > to mbox_send_message() seems a little questionable. Specifically, the
> > > mailbox core stores the currently active message directly in its
> > > `active_req` field. In at least two places it decides that if this
> > > field is `NULL` then there is no active request. That means if `mssg`
> > > is NULL it will always think there is no active request. The two
> > > places where it does this are:
> > >
> > > 1. If a client calls mbox_send_message(), if `active_req` is NULL then
> > >    it will call the mailbox controller to send the new message even if
> > >    the mailbox controller hasn't yet called mbox_chan_txdone() on the
> > >    previous (NULL) message.
> > > 2. The mailbox core will never call the client's `tx_done()` callback
> > >    with a NULL message because `tx_tick()` returns early whenever the
> > >    message is NULL.
> > >
> > > The above could be seen as bugs and perhaps could be fixed. However,
> > > doing a `git grep mbox_send_message.*NULL` shows 14 hits in mainline
> > > today and people may be relying on the current behavior. It is,
> > > perhaps, better to accept the current behavior.
> > >
> > > The current behavior can actually serve the purpose of providing a
> > > simple way to assert an edge-triggered interrupt to the remote
> > > processor on the other side of the mailbox. Specifically:
> > >
> > > 1. Like a normal edge-triggered interrupt, if multiple edges arrive
> > >    before the interrupt is Acked they are coalesced.
> > > 2. Like a normal edge-triggered interrupt, as long as the receiver
> > >    (the remote processor in this case) "Ack"s the interrupt _before_
> > >    checking for work and the sender (the mailbox client in this case)
> > >    posts the interrupt _after_ adding new work then we can always be
> > >    certain that new work will be noticed. This assumes that the
> > >    mailbox clienut and remote processor have some out-of-band way to
> > >    communicate work and the mailbox is just being used as an
> > >    interrupt.
> > >
> > > Document the current behavior so that people can rely on it and know
> > > that it will keep working the same way.
> > >
> > > NOTE: if a given mailbox client mixes and matches some NULL and some
> > > non-NULL messages, things could get loopy without additional code
> > > changes and rules. Without code changes, if we transfer a non-NULL
> > > message then we'd stop coalescing future NULL messages until the queue
> > > clears. Also: if we were transferring a NULL message and a non-NULL
> > > came in, we'd send it right away but potentially report `tx_done()`
> > > too early. For now, document mixing and matching NULL and non-NULL
> > > messages as undefined.
> > >
> > So we are talking about  'doorbell' type controllers that don't transfer any
> > data but only raise an irq to the other end.
>
> Essentially. I'm still a bit new to the mailbox world, but at least in
> the mailbox controller I'm more familiar with (and planning to
> upstream soon) it's not actually a different type of controller,
> though. The controller has shared memory to store a message and then
> has a doorbell feature. If the remote processor doesn't need any data
> but just needs an interrupt then we can have the mailbox controller
> just ring the doorbell without any associated message.
>
>
> > Ideally the client would pass the
> > doorbell info via mssg in runtime, but there may be a driver that does not need
> > that info from the client.
>
> Right. Since 'mssg' is defined as arbitrary data it does seem
> legitimate for clients to pass NULL for it. It's just unfortunate that
> the core behaves differently when `mssg` is NULL...
>
>
> > I believe some doorbell clients do call
> >  mbox_send_message(chan, 0) , the arbitrary mssg 0 is unused by the driver
> > and everything works fine.
>
> Right (well, "NULL" instead of "0"). It's actually quite common. Using
> `git grep -A2 mbox_send_message.*NULL`, I see:
>
> * drivers/acpi/acpi_pcc.c
> * drivers/firmware/arm_scmi/transports/mailbox.c
> * drivers/firmware/imx/imx-dsp.c
> * drivers/firmware/tegra/bpmp-tegra186.c
> * drivers/irqchip/irq-qcom-mpm.c
> * drivers/remoteproc/xlnx_r5_remoteproc.c
> * drivers/rpmsg/qcom_glink_rpm.c
> * drivers/rpmsg/qcom_glink_smem.c
> * drivers/rpmsg/qcom_smd.c
> * drivers/soc/qcom/qcom_aoss.c
> * drivers/soc/qcom/smp2p.c
> * drivers/soc/qcom/smsm.c
> * drivers/soc/ti/wkup_m3_ipc.c
> * drivers/soc/xilinx/zynqmp_power.c
>
> ...or did you mean something different when you said drivers are
> passing "0" as an argument?
>
Actually I meant to say some 'constant' value, but yes 0 and NULL
don't make a difference.
The core needs to track the in-flight request in the 'active_req'
pointer which is also checked
to be non-null as a marker of busyness. That _can be_ a problem when
the client sends NULL
as the message, depending upon the submission pattern and speed.
Clients that call with NULL are likely simple one-message-at-a-time
users, but I haven't checked.
If it is a doorbell controller, the mssg may still carry the pointer
to data in shmem that the driver
can copy from tx_prepare() callback.


> >  I agree that is not ideal - we should differentiate between nothing and
> > 'null' data. Is that the issue you want addressed?
>
> I guess I'm just trying to clarify what the behavior _should_ be when
> `mssg` is NULL. At first I saw the current behavior as a bug. ...but
> then I saw the number of people that were using NULL `mssg` and I
> decided that fixing the current behavior might cause problems.
>
> Maybe it's good just to clarify the difference in behavior so we're
> all on the same page.
>
> Let's imagine that a mailbox client quickly queues up 3 messages:
>   static const u8 bogus_data[] = { 0 };
>   mbox_send_message(chan, (void*) bogus_data);
>   mbox_send_message(chan, (void*) bogus_data);
>   mbox_send_message(chan, (void*) bogus_data);
>
> What will that do? Assuming that the Apps Processor (AP) is much
> faster than the remote processor, it will start sending the first
> message right away and the other two will be queued. When the first
> message is sent, the second message will be sent. When the second
> message is sent, the third message will be sent. The remote processor
> is guaranteed to see 3 messages.
>
> Writing out a big diagram, I believe this will happen:
>
> AP                                  remote
> ----------------------------------- -------------------
> mbox_send_message(&bogus_data)
>   add_to_rbuf()
>   msg_submit()
>     ops->send_data()
>       ...
>       drv_trigger_doorbell()
> mbox_send_message(&bogus_data)
>   add_to_rbuf()
>   msg_submit()
>     NOP because active_req
> mbox_send_message(&bogus_data)
>   add_to_rbuf()
>   msg_submit()
>     NOP because active_req
>                                     handle_doorbell()
>                                       ack_doorbell()
> drv_handle_doorbell_ack()
>   mbox_chan_txdone()
>     tx_tick()
>       msg_submit()
>         ops->send_data()
>           ...
>           drv_trigger_doorbell()
>                                     handle_doorbell()
>                                       ack_doorbell()
> drv_handle_doorbell_ack()
>   mbox_chan_txdone()
>     tx_tick()
>       msg_submit()
>         ops->send_data()
>           ...
>           drv_trigger_doorbell()
>                                     handle_doorbell()
>                                       ack_doorbell()
> drv_handle_doorbell_ack()
>   mbox_chan_txdone()
>     tx_tick()
>
>
> Now, let's do the same but with NULL:
>   mbox_send_message(chan, NULL);
>   mbox_send_message(chan, NULL);
>   mbox_send_message(chan, NULL);
>
> Again, assuming that the AP is much faster and that the doorbell is a
> normal edge-triggered interrupt, today this will happen:
>
> AP                                  remote
> ----------------------------------- -------------------
> mbox_send_message(NULL)
>   add_to_rbuf()
>   msg_submit()
>     ops->send_data()
>       ...
>       drv_trigger_doorbell()
> mbox_send_message(NULL)
>   add_to_rbuf()
>   msg_submit()
>     ops->send_data()
>       ...
>       drv_trigger_doorbell()
> mbox_send_message(NULL)
>   add_to_rbuf()
>   msg_submit()
>     ops->send_data()
>       ...
>       drv_trigger_doorbell()
>                                     handle_doorbell()
>                                       ack_doorbell()
> drv_handle_doorbell_ack()
>   mbox_chan_txdone()
>     tx_tick()
>
>
> If the remote processor is much faster than the AP or the AP gets
> interrupted between the `mbox_send_message(NULL)` calls then obviously
> things could look a bit different, but hopefully the above makes it
> clear enough?
>
> As I said, at first I viewed the difference between the "NULL" and
> "bogus_data" cases as a bug and thought maybe I should fix it, but
> there truly could be cases where someone might want the existing
> behavior.
>
> Presumably anyone who is passing NULL for `mssg` has some other OOB
> (out of band) way to pass data to the remote processor and the
> doorbell is just telling the remote proc to look at the new OOB data.
> With that, let's look at a (made up) example where the AP is trying to
> let the remote processor know whenever the "event_counter" changed.
> I'd imagine something like this:
>
> AP                                  remote
> ----------------------------------- -------------------
> shared_oob_event_counter++;
> mbox_send_message(NULL)
> shared_oob_event_counter++;
> mbox_send_message(NULL)
> shared_oob_event_counter++;
> mbox_send_message(NULL)
>                                     handle_doorbell()
>                                       read_and_handle_event_counter()
>                                       ack_doorbell()
>
> In other words the remote processor sees the doorbell, handles all the
> changes to the event counter, and then acks. If we instead fixed the
> NULL case to actually queue up additional messages then the remote
> processor would get 3 doorbells and the "event_counter" wouldn't
> change at all for the last two...
>
> I don't know for sure, but I'm imagining that at least some of the
> existing 14 drivers that are passing NULL for `mssg` may be relying on
> the existing behavior... From some quick spot checking:
>
> smp2p_update_bits() - Updates (out of band) shared memory and uses
> mbox_send_message(NULL) to kick the remote processor. That sounds very
> similar to my made up "event_counter" example.
>
> smsm_update_bits() seems similar
>
> Many of the other examples I looked at looked like they weren't
> relying on the existing behavior, but I didn't look at every one and
> it's harder to trace all the usage of some of the drivers...
>
>
> Anyway, I'm not 100% set on any one solution, I just see that the NULL
> case is (unexpectedly) different and either want it documented or to
> fix it so it's not different (ideally without angry people with
> pitchforks coming after me because I changed behavior). :-)
>
Yes, I too think we should leave the existing api alone to be safe.
For new pure-doorbell clients, how about something like
   #define MBOX_NODATA   ERR_PTR(-ENOMEM)
   mailbox_ring_doorbell(chan)
   {
       mailbox_send_message(chan, MBOX_NODATA)
   }
because the underlying controller driver anyway ignores the submitted
'mssg' pointer.

And of course add the clarification comment in this patch.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ