lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Feb 2022 08:48:51 -0500 (EST)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        paulmck <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
        "H. Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
        linux-api <linux-api@...r.kernel.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Florian Weimer <fw@...eb.enyo.de>,
        David Laight <David.Laight@...LAB.COM>,
        carlos <carlos@...hat.com>, Peter Oskolkov <posk@...k.io>
Subject: Re: [RFC PATCH 1/3] Introduce per thread group current virtual cpu
 id

----- On Feb 2, 2022, at 6:23 AM, Peter Zijlstra peterz@...radead.org wrote:

> On Tue, Feb 01, 2022 at 02:25:38PM -0500, Mathieu Desnoyers wrote:
> 
>> +static inline void tg_vcpu_get(struct task_struct *t)
>> +{
>> +	struct cpumask *cpumask = &t->signal->vcpu_mask;
>> +	unsigned int vcpu;
>> +
>> +	if (t->flags & PF_KTHREAD)
>> +		return;
>> +	/* Atomically reserve lowest available vcpu number. */
>> +	do {
>> +		vcpu = cpumask_first_zero(cpumask);
>> +		WARN_ON_ONCE(vcpu >= nr_cpu_ids);
>> +	} while (cpumask_test_and_set_cpu(vcpu, cpumask));
>> +	t->tg_vcpu = vcpu;
>> +}
>> +
>> +static inline void tg_vcpu_put(struct task_struct *t)
>> +{
>> +	if (t->flags & PF_KTHREAD)
>> +		return;
>> +	cpumask_clear_cpu(t->tg_vcpu, &t->signal->vcpu_mask);
>> +	t->tg_vcpu = 0;
>> +}
> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 2e4ae00e52d1..2690e80977b1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4795,6 +4795,8 @@ prepare_task_switch(struct rq *rq, struct task_struct
>> *prev,
>>  	sched_info_switch(rq, prev, next);
>>  	perf_event_task_sched_out(prev, next);
>>  	rseq_preempt(prev);
>> +	tg_vcpu_put(prev);
>> +	tg_vcpu_get(next);
> 
> 
> URGGHHH!!! that's *2* atomics extra on the context switch path. Worse,
> that's on a line that's trivially contended with a few threads.

There is one obvious optimization that just begs to be done here: when
switching between threads belonging to the same process, we can simply
take the vcpu_id tag of the prev thread and use it for next,
without requiring any atomic operation.

This only leaves the overhead of added atomics when scheduling between
threads which belong to different processes. Does it matter as much ?
If it's the case, then we should really scratch our heads a little more
to come up with improvements.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Powered by blists - more mailing lists