[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhttbqm1s2.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Fri, 29 Nov 2024 17:40:29 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
kvm@...r.kernel.org, linux-mm@...ck.org, bpf@...r.kernel.org,
x86@...nel.org, rcu@...r.kernel.org, linux-kselftest@...r.kernel.org,
Nicolas Saenz Julienne <nsaenzju@...hat.com>, Steven Rostedt
<rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>, Jonathan
Corbet <corbet@....net>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, Paolo
Bonzini <pbonzini@...hat.com>, Wanpeng Li <wanpengli@...cent.com>, Vitaly
Kuznetsov <vkuznets@...hat.com>, Andy Lutomirski <luto@...nel.org>, Peter
Zijlstra <peterz@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>, Joel Fernandes
<joel@...lfernandes.org>, Josh Triplett <josh@...htriplett.org>, Boqun
Feng <boqun.feng@...il.com>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, Andrew Morton
<akpm@...ux-foundation.org>, Uladzislau Rezki <urezki@...il.com>,
Christoph Hellwig <hch@...radead.org>, Lorenzo Stoakes
<lstoakes@...il.com>, Josh Poimboeuf <jpoimboe@...nel.org>, Jason Baron
<jbaron@...mai.com>, Kees Cook <keescook@...omium.org>, Sami Tolvanen
<samitolvanen@...gle.com>, Ard Biesheuvel <ardb@...nel.org>, Nicholas
Piggin <npiggin@...il.com>, Juerg Haefliger
<juerg.haefliger@...onical.com>, Nicolas Saenz Julienne
<nsaenz@...nel.org>, "Kirill A. Shutemov"
<kirill.shutemov@...ux.intel.com>, Nadav Amit <namit@...are.com>, Dan
Carpenter <error27@...il.com>, Chuang Wang <nashuiliang@...il.com>, Yang
Jihong <yangjihong1@...wei.com>, Petr Mladek <pmladek@...e.com>, "Jason A.
Donenfeld" <Jason@...c4.com>, Song Liu <song@...nel.org>, Julian Pidancet
<julian.pidancet@...cle.com>, Tom Lendacky <thomas.lendacky@....com>,
Dionna Glaze <dionnaglaze@...gle.com>, Thomas Weißschuh
<linux@...ssschuh.net>, Juri Lelli <juri.lelli@...hat.com>, Marcelo
Tosatti <mtosatti@...hat.com>, Yair Podemsky <ypodemsk@...hat.com>, Daniel
Wagner <dwagner@...e.de>, Petr Tesarik <ptesarik@...e.com>
Subject: Re: [RFC PATCH v3 11/15] context-tracking: Introduce work deferral
infrastructure
On 24/11/24 22:46, Frederic Weisbecker wrote:
> Le Fri, Nov 22, 2024 at 03:56:59PM +0100, Valentin Schneider a écrit :
>> On 20/11/24 18:30, Frederic Weisbecker wrote:
>> > Le Wed, Nov 20, 2024 at 06:10:43PM +0100, Valentin Schneider a écrit :
>> >> On 20/11/24 15:23, Frederic Weisbecker wrote:
>> >>
>> >> > Ah but there is CT_STATE_GUEST and I see the last patch also applies that to
>> >> > CT_STATE_IDLE.
>> >> >
>> >> > So that could be:
>> >> >
>> >> > bool ct_set_cpu_work(unsigned int cpu, unsigned int work)
>> >> > {
>> >> > struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
>> >> > unsigned int old;
>> >> > bool ret = false;
>> >> >
>> >> > preempt_disable();
>> >> >
>> >> > old = atomic_read(&ct->state);
>> >> >
>> >> > /* CT_STATE_IDLE can be added to last patch here */
>> >> > if (!(old & (CT_STATE_USER | CT_STATE_GUEST))) {
>> >> > old &= ~CT_STATE_MASK;
>> >> > old |= CT_STATE_USER;
>> >> > }
>> >>
>> >> Hmph, so that lets us leverage the cmpxchg for a !CT_STATE_KERNEL check,
>> >> but we get an extra loop if the target CPU exits kernelspace not to
>> >> userspace (e.g. vcpu or idle) in the meantime - not great, not terrible.
>> >
>> > The thing is, what you read with atomic_read() should be close to reality.
>> > If it already is != CT_STATE_KERNEL then you're good (minus racy changes).
>> > If it is CT_STATE_KERNEL then you still must do a failing cmpxchg() in any case,
>> > at least to make sure you didn't miss a context tracking change. So the best
>> > you can do is a bet.
>> >
>> >>
>> >> At the cost of one extra bit for the CT_STATE area, with CT_STATE_KERNEL=1
>> >> we could do:
>> >>
>> >> old = atomic_read(&ct->state);
>> >> old &= ~CT_STATE_KERNEL;
>> >
>> > And perhaps also old |= CT_STATE_IDLE (I'm seeing the last patch now),
>> > so you at least get a chance of making it right (only ~CT_STATE_KERNEL
>> > will always fail) and CPUs usually spend most of their time idle.
>> >
>>
>> I'm thinking with:
>>
>> CT_STATE_IDLE = 0,
>> CT_STATE_USER = 1,
>> CT_STATE_GUEST = 2,
>> CT_STATE_KERNEL = 4, /* Keep that as a standalone bit */
>
> Right!
>
>>
>> we can stick with old &= ~CT_STATE_KERNEL; and that'll let the cmpxchg
>> succeed for any of IDLE/USER/GUEST.
>
> Sure but if (old & CT_STATE_KERNEL), cmpxchg() will consistently fail.
> But you can make a bet that it has switched to CT_STATE_IDLE between
> the atomic_read() and the first atomic_cmpxchg(). This way you still have
> a tiny chance to succeed.
>
> That is:
>
> old = atomic_read(&ct->state);
> if (old & CT_STATE_KERNEl)
> old |= CT_STATE_IDLE;
> old &= ~CT_STATE_KERNEL;
>
>
> do {
> atomic_try_cmpxchg(...)
>
> Hmm?
But it could equally be CT_STATE_{USER, GUEST}, right? That is, if we have
all of this enabled them we assume the isolated CPUs spend the least amount
of time in the kernel, if they don't we get to blame the user.
Powered by blists - more mailing lists