[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fac9a5fb-7a39-4c12-9dca-d2338b6dad8c@linaro.org>
Date: Fri, 4 Jul 2025 13:30:50 +0100
From: Tudor Ambarus <tudor.ambarus@...aro.org>
To: Jassi Brar <jassisinghbrar@...il.com>
Cc: peter.griffin@...aro.org, andre.draszik@...aro.org,
willmcvicker@...gle.com, cristian.marussi@....com, sudeep.holla@....com,
kernel-team@...roid.com, arm-scmi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
Hi, Jassi,
Sorry for the delay, I was out for a while.
On 6/23/25 12:41 AM, Jassi Brar wrote:
> On Fri, Jun 6, 2025 at 8:41 AM Tudor Ambarus <tudor.ambarus@...aro.org> wrote:
>>
>> There are two cases where the chan lock is released and reacquired
>> were it shouldn't really be:
>>
>> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
>> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
>> if the mailbox core is under heavy load, the mailbox software queue may
>> fill up without any of the threads getting the chance to drain the
>> software queue.
>> T#0 acquires chan lock, fills rbuf, releases the lock, then
>> T#1 acquires chan lock, fills rbuf, releases the lock, then
>> ...
>> T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
>> We shall drain the software queue as fast as we can, while still holding
>> the channel lock.
>>
> I don't see any issue to fix to begin with.
> T#0 does drain the queue by moving on to submit the message after
> adding it to the rbuf.
The problem is that the code releases the chan->lock after adding the
message to rbuf and then reacquires it on submit. A thread can be
preempted after add_to_rbuf(), without getting the chance to get to
msg_submit().
Let's assume that
T#0 adds to rbuf and gets preempted by T#1
T#1 adds to rbuf and gets preempted by T#2
...
T#n-1 adds to rbuf and gets preempted by T#n
We fill the mailbox software queue without any thread getting to
msg_submit().
Thanks,
ta
> And until the tx is done, T#1 would still be only adding to the rbuf
> because of chan->active_req.
>
>> 2/ tx_tick() releases the lock after setting chan->active_req = NULL.
>> This gives again the possibility for the software queue to fill up, as
>> described in case 1/.
>>
> This again is not an issue. The user(s) should account for the fact
> that the message bus
> may be busy and there can be only limited buffers in the queue.
>
> Thanks
Powered by blists - more mailing lists