linux-kernel - Re: [PATCH] iommu/arm-smmu-v3: Fix event queue overflow acknowledgment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230308140204.83249-1-krckatom@amazon.de>
Date:   Wed, 8 Mar 2023 14:02:04 +0000
From:   Tomas Krcka <krckatom@...zon.de>
To:     <robin.murphy@....com>
CC:     <baolu.lu@...ux.intel.com>, <iommu@...ts.linux.dev>,
        <joro@...tes.org>, <krckatom@...zon.de>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>,
        <shameerali.kolothum.thodi@...wei.com>, <will@...nel.org>
Subject: Re: [PATCH] iommu/arm-smmu-v3: Fix event queue overflow acknowledgment

>> When an overflow occurs in the event queue, the SMMU toggles overflow
>> flag OVFLG in the PROD register.
>> The evtq thread is supposed to acknowledge the overflow flag by toggling
>> flag OVACKFLG in the CONS register, otherwise the overflow condition is
>> still active (OVFLG != OVACKFLG).
>>
>> Currently the acknowledge register is toggled after clearing the event
>> queue but is never propagated to the hardware. It would be done next
>> time when executing evtq thread.
>>
>> The SMMU still adds elements to the queue when the overflow condition is
>> active but any subsequent overflow information after clearing the event
>> queue will be lost.
>>
>> This change keeps the SMMU in sync as it's expected by design.
>
> If I've understood correctly, the upshot of this is that if the queue
> has overflowed once, become empty, then somehow goes from empty to full
> before we manage to consume a single event, we won't print the "events
> lost" message a second time.
>
> Have you seen this happen in practice? TBH if the event queue ever
> overflows even once it's indicative that the system is hosed anyway, so
> it's not clear to me that there's any great loss of value in sometimes
> failing to repeat a warning for a chronic ongoing operational failure.
>

Yes, I did see in practice. And it’s not just about loosing subsequence warning.
The way how it’s done now keeps inconsistent CONS register value between SMMU and the kernel
until any new event happens. The kernel doesn’t inform SMMU that we know about the overflow
and consuming events as fast as we can.

> It could be argued that we have a subtle inconsistency between
> arm_smmu_evtq_thread() and arm_smmu_priq_thread() here, but the fact is
> that the Event queue and PRI queue *do* have different overflow
> behaviours, so it could equally be argued that inconsistency in the code
> helps reflect that. FWIW I can't say I have a strong preference either way.

For the argument that the code can reflect the difference.
Then the comment 'Sync our overflow flag, as we believe we're up to speed’ is
already misleading.

Thanks.
BR,
Tomas



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879