linux-kernel - Re: [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db26b1e3-bd7a-44ee-b458-6cb0fedf6662@linux.intel.com>
Date: Thu, 15 Jan 2026 11:26:50 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: "Tian, Kevin" <kevin.tian@...el.com>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
 Jason Gunthorpe <jgg@...dia.com>
Cc: Dmytro Maluka <dmaluka@...omium.org>,
 Samiullah Khawaja <skhawaja@...gle.com>,
 "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context
 entries

On 1/14/26 15:54, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@...ux.intel.com>
>> Sent: Tuesday, January 13, 2026 11:01 AM
>>
>> On Intel IOMMU, device context entries are accessed by hardware in
>> 128-bit chunks. Currently, the driver updates these entries by
>> programming the 'lo' and 'hi' 64-bit fields individually.
>>
>> This creates a potential race condition where the IOMMU hardware may
>> fetch
>> a context entry while the CPU has only completed one of the two 64-bit
>> writes. This "torn" entry — consisting of half-old and half-new data —
>> could lead to unpredictable hardware behavior, especially when
>> transitioning the 'Present' bit or changing translation types.
> 
> this is not accurate. context entry is 128bits only. Scalable context
> entry is 256bits but only the lower 128bits are defined. so hardware always
> fetches the context entry atomically. Then if software ensures the right
> order of updates (clear present first then other bits), the hardware won't
> look at the partial entry after seeing present=0.
> 
> But now as Dmytro reported there is no barrier in place so two 64bits
> updates to the context entry might be reordered so hw could fetch
> an entry with old lower half (present=1) and new higher half.
> 
> then 128bit atomic operation avoids this ordering concern.

You're right. I will update the commit message to be more precise. Since
the hardware fetches the 128-bit context entry atomically, the issue is
essentially a software ordering problem.

We considered three approaches to solve this:

- Memory barriers (to enforce Present bit clearing order)
- WRITE_ONCE() (to prevent compiler reordering)
- 128-bit atomic updates

This patch uses the atomic update approach.

> 
>> @@ -1170,19 +1170,19 @@ static int domain_context_mapping_one(struct
>> dmar_domain *domain,
>>   		goto out_unlock;
>>
>>   	copied_context_tear_down(iommu, context, bus, devfn);
>> -	context_clear_entry(context);
>> -	context_set_domain_id(context, did);
>> +	context_set_domain_id(&new, did);
> 
> I wonder whether it's necessary to use atomic in the attach path, from
> fix p.o.v.
> 
> The assumption is that the context should have been cleared already
> before calling this function (and following ones). Does it make more
> sense to check the present bit, warning if set, then fail the operation?
> We could refactor them to do atomic update, but then it's for cleanup> instead of being part of a fix.

Yes. For the attach path, this is a cleanup rather than a fix.

> 
> Then this may be split into three patches:
> 
> - change context_clear_entry() to be atomic, to fix the teardown path
> - add present bit check in other functions in this patch, to scrutinize the
>    attach path
> - change those functions to be atomic, as a clean up

Perhaps this also paves the way for enabling hitless replace in the
attach_dev path?

> Does it make sense?

Yes, it is.

Thanks,
baolu