linux-kernel - Re: [PATCH v4 00/25] RSEQ node id and virtual cpu id extensions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55ade976-efac-3a89-f5e4-9008b7030388@efficios.com>
Date:   Mon, 17 Oct 2022 12:05:34 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Florian Weimer <fweimer@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        "H . Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
        linux-api@...r.kernel.org, Christian Brauner <brauner@...nel.org>,
        David.Laight@...LAB.COM, carlos@...hat.com,
        Peter Oskolkov <posk@...k.io>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>
Subject: Re: [PATCH v4 00/25] RSEQ node id and virtual cpu id extensions

On 2022-10-10 09:04, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> Extend the rseq ABI to expose a NUMA node ID and a vm_vcpu_id field.
>>
>> The NUMA node ID field allows implementing a faster getcpu(2) in libc.
>>
>> The virtual cpu id allows ideal scaling (down or up) of user-space
>> per-cpu data structures. The virtual cpu ids allocated within a memory
>> space are tracked by the scheduler, which takes into account the number
>> of concurrently running threads, thus implicitly considering the number
>> of threads, the cpu affinity, the cpusets applying to those threads, and
>> the number of logical cores on the system.
> 
> Do you have some code that shows how the userspace application handshake
> is supposed to work with the existing three __rseq_* symbols?  Maybe I'm
> missing something.

see https://lore.kernel.org/lkml/20220922105941.237830-5-mathieu.desnoyers@efficios.com/

+static
+unsigned int get_rseq_feature_size(void)
+{
+	unsigned long auxv_rseq_feature_size, auxv_rseq_align;
+
+	auxv_rseq_align = getauxval(AT_RSEQ_ALIGN);
+	assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE);
+
+	auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE);
+	assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE);
+	if (auxv_rseq_feature_size)
+		return auxv_rseq_feature_size;
+	else
+		return ORIG_RSEQ_FEATURE_SIZE;
+}

then in rseq_init():

+	rseq_feature_size = get_rseq_feature_size();
+	if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE)
+		rseq_size = ORIG_RSEQ_ALLOC_SIZE;
+	else
+		rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE;

Then using it for e.g. node_id:

https://lore.kernel.org/lkml/20220922105941.237830-6-mathieu.desnoyers@efficios.com/

+#ifndef rseq_sizeof_field
+#define rseq_sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
+#endif
+
+#ifndef rseq_offsetofend
+#define rseq_offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ rseq_sizeof_field(TYPE, MEMBER))
+#endif

+static inline bool rseq_node_id_available(void)
+{
+	return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, node_id);
+}
+
+/*
+ * Current NUMA node number.
+ */
+static inline uint32_t rseq_current_node_id(void)
+{
+	assert(rseq_node_id_available());
+	return RSEQ_ACCESS_ONCE(rseq_get_abi()->node_id);
+}

> 
>  From an application perspective, it would be best to add 8 more shared
> bytes in use, to push the new feature size over 32.  This would be
> clearly visible in __rseq_size, helping applications a lot.

[ I guess you meant 12 bytes ]

The fool-proof approach here would be to skip the 12 bytes of padding
currently at the end of struct rseq.

Maybe this is something we should do in order to make sure the userspace
check is regular for all fields.

> 
> Alternatively, we could sacrifice a bit to indicate that the this round
> of extensions is present.  But we'll need another bit to indicate that
> the last remaining 4 bytes are in use, for consistency.  Or come up with
> something to put their today.  The TID seems like an obvious choice.

Whatever we add into those bits would need to be "special" and use
something like a flag check to validate whether the field is populated
or not. Perhaps keeping things simpler and skipping those 12 bytes
entirely is preferable.

> 
> If we want to the 8 more bytes route, TID and PID should be
> uncontroversal?  The PID cache is clearly something that userspace
> likes, not just as a defeat device for the old BYTE benchmark.

I agree that having the PID and TID there might be relevant, but I
would rather prefer to have all fields use a check that is regular
from the point of view of userspace. This minimizes the risk of user
errors.

Thoughts ?

Thanks,

Mathieu


> 
> Thanks,
> Florian
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com