linux-kernel - Re: [RFC PATCH v4 5/8] x86/mm: Introduce Remote Action Request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f230df34-2959-47f9-9131-ac233d85027e@gmail.com>
Date: Sat, 21 Jun 2025 00:24:37 +0300
From: Nadav Amit <nadav.amit@...il.com>
To: Rik van Riel <riel@...riel.com>, linux-kernel@...r.kernel.org
Cc: kernel-team@...a.com, dave.hansen@...ux.intel.com, luto@...nel.org,
 peterz@...radead.org, bp@...en8.de, x86@...nel.org, seanjc@...gle.com,
 tglx@...utronix.de, mingo@...nel.org, Yu-cheng Yu <yu-cheng.yu@...el.com>
Subject: Re: [RFC PATCH v4 5/8] x86/mm: Introduce Remote Action Request

On 20/06/2025 4:10, Rik van Riel wrote:
> On Fri, 2025-06-20 at 02:01 +0300, Nadav Amit wrote:
>>> +
>>> +static void set_payload(struct rar_payload *p, u16 pcid, unsigned
>>> long start,
>>> +			long pages)
>>> +{
>>> +	p->must_be_zero_1	= 0;
>>> +	p->must_be_zero_2	= 0;
>>> +	p->must_be_zero_3	= 0;
>>> +	p->page_size		= RAR_INVLPG_PAGE_SIZE_4K;
>>
>> I think you can propagate the stride to this point instead of using
>> fixed 4KB stride.
> 
> Agreed. That's another future optimization, for
> once this code all works right. 

It might not be an optimization if it leads to performance regression 
relatively to the current software-based flush that does take the stride 
into account. IOW: flushing 2MB huge-page would end up doing a full TLB 
flush with RAR in contrast to selectively flushing the specific entry 
with the software-based shootdown. Correct me if I'm wrong.

> 
> Currently I am in a situation where the receiving
> CPU clears the action vector from RAR_PENDING to
> RAR_SUCCESS, but the TLB does not appear to always
> be correctly flushed.

Interesting. I really do not know anything about it, but you may want to 
look on performance counters to see whether the flush is at least 
reported to take place.

> 
>>
>>> +	p->type			= RAR_TYPE_INVPCID;
>>> +	p->pcid			= pcid;
>>> +	p->linear_address	= start;
>>> +
>>> +	if (pcid) {
>>> +		/* RAR invalidation of the mapping of a specific
>>> process. */
>>> +		if (pages < RAR_INVLPG_MAX_PAGES) {
>>> +			p->num_pages = pages;
>>> +			p->subtype = RAR_INVPCID_ADDR;
>>> +		} else {
>>> +			p->subtype = RAR_INVPCID_PCID;
>>
>> I wonder whether it would be safer to set something to p->num_pages.
>> (then we can do it unconditionally)
> 
> We have a limited number of bits available for
> p->num_pages. I'm not sure we want to try
> writing a larger number than what fits in those
> bits.

I was just looking for a way to simplify the code and at the same time 
avoid "undefined" state. History proved the all kind of unintended 
consequences happen when some state is uninitialized (even if the spec 
does not require it).

> 
>>
>>> +		}
>>> +	} else {
>>> +		/*
>>> +		 * Unfortunately RAR_INVPCID_ADDR excludes global
>>> translations.
>>> +		 * Always do a full flush for kernel
>>> invalidations.
>>> +		 */
>>> +		p->subtype = RAR_INVPCID_ALL;
>>> +	}
>>> +
>>> +	/* Ensure all writes are visible before the action entry
>>> is set. */
>>> +	smp_wmb();
>>
>> Maybe you can drop the smp_wmb() here and instead change the
>> WRITE_ONCE() in set_action_entry() to smp_store_release() ? It should
>> have the same effect and I think would be cleaner and convey your
>> intent
>> better.
>>
> We need protection against two different things here.
> 
> 1) Receiving CPUs must see all the writes done by
>     the originating CPU before we send the RAR IPI.
> 
> 2) Receiving CPUs must see all the writes done by
>     set_payload() before the write done by
>     set_action_entry(), in case another CPU sends
>     the RAR IPI before we do.
> 
>     That other RAR IPI could even be sent between
>     when we write the payload, and when we write
>     the action entry. The receiving CPU could take
>     long enough processing other RAR payloads that
>     it can see our action entry after we write it.
> 
> Does removing the smp_wmb() still leave everything
> safe in that scenario?

Admittedly, I do not understand your concern. IIUC until 
set_action_entry() is called, and until RAR_PENDING is set, IPIs are 
irrelevant and the RAR entry would not be processed. Once it is set, you 
want all the prior writes to be visible. Semantically, that's exactly 
what smp_store_release() means.

Technically, smp_wmb() + WRITE_ONCE() are equivalent to 
smp_store_release() on x86. smp_wmb() is actually a compiler barrier as 
x86 follows the TSO memory model, and smp_store_release() is actually a 
compiler-barrier + WRITE_ONCE(). So I think it's all safe and at the 
same time clearer.