[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c347cb4-bb82-4209-82fc-59088cc19bc2@arm.com>
Date: Mon, 1 Dec 2025 19:57:31 +0000
From: Robin Murphy <robin.murphy@....com>
To: Jacob Pan <jacob.pan@...ux.microsoft.com>, Will Deacon <will@...nel.org>
Cc: linux-kernel@...r.kernel.org,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
Joerg Roedel <joro@...tes.org>, Mostafa Saleh <smostafa@...gle.com>,
Jason Gunthorpe <jgg@...dia.com>, Nicolin Chen <nicolinc@...dia.com>,
Zhang Yu <zhangyu1@...ux.microsoft.com>,
Jean Philippe-Brucker <jean-philippe@...aro.org>,
Alexander Grest <Alexander.Grest@...rosoft.com>
Subject: Re: [PATCH v4 1/2] iommu/arm-smmu-v3: Fix CMDQ timeout warning
On 2025-11-30 11:06 pm, Jacob Pan wrote:
> Hi Will,
>
> On Tue, 25 Nov 2025 17:19:16 +0000
> Will Deacon <will@...nel.org> wrote:
>
>> On Fri, Nov 14, 2025 at 09:17:17AM -0800, Jacob Pan wrote:
>>> While polling for n spaces in the cmdq, the current code instead
>>> checks if the queue is full. If the queue is almost full but not
>>> enough space (<n), then the CMDQ timeout warning is never triggered
>>> even if the polling has exceeded timeout limit.
>>>
>>> The existing arm_smmu_cmdq_poll_until_not_full() doesn't fit
>>> efficiently nor ideally to the only caller
>>> arm_smmu_cmdq_issue_cmdlist():
>>> - It uses a new timer at every single call, which fails to limit
>>> to the preset ARM_SMMU_POLL_TIMEOUT_US per issue.
>>> - It has a redundant internal queue_full(), which doesn't detect
>>> whether there is a enough space for number of n commands.
>>>
>>> This patch polls for the availability of exact space instead of
>>> full and emit timeout warning accordingly.
>>>
>>> Fixes: 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during
>>> command-queue insertion") Co-developed-by: Yu Zhang
>>> <zhangyu1@...ux.microsoft.com> Signed-off-by: Yu Zhang
>>> <zhangyu1@...ux.microsoft.com> Signed-off-by: Jacob Pan
>>> <jacob.pan@...ux.microsoft.com>
>>
>> I'm assuming you're seeing problems with an emulated command queue?
>> Any chance you could make that bigger?
>>
> This is not related to queue size, but rather a logic issue when
> anytime queue is nearly full.
It is absolutely related to queue size, because there are only two
reasons why this warning should ever be seen:
1: The SMMU is in some unexpected error state and has stopped consuming
commands altogether.
2: The queue is far too small to do its job of buffering commands for
the number of present CPUs.
>>> @@ -804,12 +794,13 @@ int arm_smmu_cmdq_issue_cmdlist(struct
>>> arm_smmu_device *smmu, local_irq_save(flags);
>>> llq.val = READ_ONCE(cmdq->q.llq.val);
>>> do {
>>> + struct arm_smmu_queue_poll qp;
>>> u64 old;
>>>
>>> + queue_poll_init(smmu, &qp);
>>> while (!queue_has_space(&llq, n + sync)) {
>>> local_irq_restore(flags);
>>> - if
>>> (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
>>> - dev_err_ratelimited(smmu->dev,
>>> "CMDQ timeout\n");
>>> + arm_smmu_cmdq_poll(smmu, cmdq, &llq, &qp);
>>>
>>
>> Isn't this broken for wfe-based polling? The SMMU only generates the
>> wake-up event when the queue becomes non-full.
> I don't see this is a problem since any interrupts such as scheduler
> tick can be a break evnt for WFE, no?
No, we cannot assume that any other WFE wakeup event is *guaranteed*.
It's certainly possible to get stuck in WFE on a NOHZ_FULL kernel with
the arch timer event stream disabled - I recall proving that back when I
hoped I might not have to bother upstreaming a workaround for MMU-600
erratum #1076982.
Yes, a large part of the reason we enable the event stream by default is
to help mitigate errata and software bugs which lead to missed events,
but that still doesn't mean we should consciously abuse it. I guess
something like the diff below (completely untested) would be a bit
closer to correct (basically, allow WFE when the queue is completely
full, but suppress it if we're then still waiting for more space in a
non-full queue).
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a0c6d87f85a1..206c6c6860dd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -123,7 +123,7 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
}
/* Low-level queue manipulation functions */
-static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
+static int queue_space(struct arm_smmu_ll_queue *q)
{
u32 space, prod, cons;
@@ -135,7 +135,7 @@ static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
else
space = cons - prod;
- return space >= n;
+ return space;
}
static bool queue_empty(struct arm_smmu_ll_queue *q)
@@ -796,9 +796,11 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
do {
struct arm_smmu_queue_poll qp;
u64 old;
+ int space;
queue_poll_init(smmu, &qp);
- while (!queue_has_space(&llq, n + sync)) {
+ while (space = queue_space(&llq), space < n + sync) {
+ qp.wfe &= !space;
local_irq_restore(flags);
arm_smmu_cmdq_poll(smmu, cmdq, &llq, &qp);
local_irq_save(flags);
Powered by blists - more mailing lists