lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94c23850-265c-5d71-f0a3-e02b5dbc050e@efficios.com>
Date:   Thu, 28 Sep 2023 16:43:58 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        "H . Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
        linux-api@...r.kernel.org, Christian Brauner <brauner@...nel.org>,
        Florian Weimer <fw@...eb.enyo.de>, David.Laight@...LAB.COM,
        carlos@...hat.com, Peter Oskolkov <posk@...k.io>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Chris Kennelly <ckennelly@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Darren Hart <dvhart@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        André Almeida <andrealmeid@...lia.com>,
        libc-alpha@...rceware.org, Steven Rostedt <rostedt@...dmis.org>,
        Jonathan Corbet <corbet@....net>,
        Noah Goldstein <goldstein.w.n@...il.com>,
        Daniel Colascione <dancol@...gle.com>, longman@...hat.com,
        Florian Weimer <fweimer@...hat.com>
Subject: Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq

On 9/28/23 16:21, Thomas Gleixner wrote:
> On Mon, May 29 2023 at 15:14, Mathieu Desnoyers wrote:
>> +void __rseq_set_sched_state(struct task_struct *t, unsigned int state);
>> +
>> +static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
>> +{
>> +	if (t->rseq_sched_state)
>> +		__rseq_set_sched_state(t, state);
> 
> This is invoked on every context switch and writes over that state
> unconditionally even in the case that the state was already
> cleared. There are enough situations where tasks are scheduled out
> several times while being in the kernel.

Right, if this becomes more than a PoC, I'll make sure to keep track of 
the current state within the task struct, and only update userspace on 
state transition.

> 
>>   /* rseq_preempt() requires preemption to be disabled. */
>>   static inline void rseq_preempt(struct task_struct *t)
>>   {
>>   	__set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask);
>>   	rseq_set_notify_resume(t);
>> +	rseq_set_sched_state(t, 0);
> 
> This code is already stupid to begin with. __set_bit() is cheap, but
> rseq_set_notify_resume() is not as it has a conditional and a locked
> instruction

What alternative would you recommend to set a per-thread state that has 
the same effect as TIF_NOTIFY_RESUME ? Indeed all it really needs to 
synchronize with is the thread owning the flags, but AFAIU having this 
flag part of the TIF flags requires use of an atomic instruction to 
synchronize updates against concurrent threads.

If we move this thread flag into a separate field of struct thread_info, 
then we could turn this atomic set bit into a more lightweight store, 
but then we'd need to check against an extra field on return to userspace.

And if we want to remove the conditional branch on the scheduler 
fast-path, we could always load and test both the task struct's rseq 
pointer and the thread_info "preempted" state on return to userspace.

The tradeoff there would be to add extra loads and conditional branches 
on return to userspace to speed up the scheduler fast-path.

Is this what you have in mind or am I missing your point ?

> and now you add two more conditionals into the context
> switch path.

I'm open to suggestions on how to improve this if this goes beyond PoC 
stage and we observe measurable benefits on the userspace side.

Thanks,

Mathieu

> 
> Thanks,
> 
>          tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ