lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Jul 2023 17:55:44 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, kvm@...r.kernel.org, linux-mm@...ck.org,
        bpf@...r.kernel.org, x86@...nel.org, rcu@...r.kernel.org,
        linux-kselftest@...r.kernel.org,
        Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Triplett <josh@...htriplett.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Zqiang <qiang.zhang1211@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Uladzislau Rezki <urezki@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        Lorenzo Stoakes <lstoakes@...il.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Jason Baron <jbaron@...mai.com>,
        Kees Cook <keescook@...omium.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Juerg Haefliger <juerg.haefliger@...onical.com>,
        Nicolas Saenz Julienne <nsaenz@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Nadav Amit <namit@...are.com>,
        Dan Carpenter <error27@...il.com>,
        Chuang Wang <nashuiliang@...il.com>,
        Yang Jihong <yangjihong1@...wei.com>,
        Petr Mladek <pmladek@...e.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>, Song Liu <song@...nel.org>,
        Julian Pidancet <julian.pidancet@...cle.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Dionna Glaze <dionnaglaze@...gle.com>,
        Thomas Weißschuh <linux@...ssschuh.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Yair Podemsky <ypodemsk@...hat.com>
Subject: Re: [RFC PATCH v2 15/20] context-tracking: Introduce work deferral
 infrastructure

On 24/07/23 16:52, Frederic Weisbecker wrote:
> Le Thu, Jul 20, 2023 at 05:30:51PM +0100, Valentin Schneider a écrit :
>> +enum ctx_state {
>> +	/* Following are values */
>> +	CONTEXT_DISABLED	= -1,	/* returned by ct_state() if unknown */
>> +	CONTEXT_KERNEL		= 0,
>> +	CONTEXT_IDLE		= 1,
>> +	CONTEXT_USER		= 2,
>> +	CONTEXT_GUEST		= 3,
>> +	CONTEXT_MAX             = 4,
>> +};
>> +
>> +/*
>> + * We cram three different things within the same atomic variable:
>> + *
>> + *                CONTEXT_STATE_END                        RCU_DYNTICKS_END
>> + *                         |       CONTEXT_WORK_END                |
>> + *                         |               |                       |
>> + *                         v               v                       v
>> + *         [ context_state ][ context work ][ RCU dynticks counter ]
>> + *         ^                ^               ^
>> + *         |                |               |
>> + *         |        CONTEXT_WORK_START      |
>> + * CONTEXT_STATE_START              RCU_DYNTICKS_START
>
> Should the layout be displayed in reverse? Well at least I always picture
> bitmaps in reverse, that's probably due to the direction of the shift arrows.
> Not sure what is the usual way to picture it though...
>

Surprisingly, I managed to confuse myself with that comment :-)  I think I
am subconsciously more used to the reverse as well. I've flipped that and
put "MSB" / "LSB" at either end.

>> + */
>> +
>> +#define CT_STATE_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE)
>> +
>> +#define CONTEXT_STATE_START 0
>> +#define CONTEXT_STATE_END   (bits_per(CONTEXT_MAX - 1) - 1)
>
> Since you have non overlapping *_START symbols, perhaps the *_END
> are superfluous?
>

They're only really there to tidy up the GENMASK() further down - it keeps
the range and index definitions in one hunk. I tried defining that directly
within the GENMASK() themselves but it got too ugly IMO.

>> +
>> +#define RCU_DYNTICKS_BITS  (IS_ENABLED(CONFIG_CONTEXT_TRACKING_WORK) ? 16 : 31)
>> +#define RCU_DYNTICKS_START (CT_STATE_SIZE - RCU_DYNTICKS_BITS)
>> +#define RCU_DYNTICKS_END   (CT_STATE_SIZE - 1)
>> +#define RCU_DYNTICKS_IDX   BIT(RCU_DYNTICKS_START)
>
> Might be the right time to standardize and fix our naming:
>
> CT_STATE_START,
> CT_STATE_KERNEL,
> CT_STATE_USER,
> ...
> CT_WORK_START,
> CT_WORK_*,
> ...
> CT_RCU_DYNTICKS_START,
> CT_RCU_DYNTICKS_IDX
>

Heh, I have actually already done this for v3, though I hadn't touched the
RCU_DYNTICKS* family. I'll fold that in.

>> +bool ct_set_cpu_work(unsigned int cpu, unsigned int work)
>> +{
>> +	struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
>> +	unsigned int old;
>> +	bool ret = false;
>> +
>> +	preempt_disable();
>> +
>> +	old = atomic_read(&ct->state);
>> +	/*
>> +	 * Try setting the work until either
>> +	 * - the target CPU no longer accepts any more deferred work
>> +	 * - the work has been set
>> +	 *
>> +	 * NOTE: CONTEXT_GUEST intersects with CONTEXT_USER and CONTEXT_IDLE
>> +	 * as they are regular integers rather than bits, but that doesn't
>> +	 * matter here: if any of the context state bit is set, the CPU isn't
>> +	 * in kernel context.
>> +	 */
>> +	while ((old & (CONTEXT_GUEST | CONTEXT_USER | CONTEXT_IDLE)) && !ret)
>
> That may still miss a recent entry to userspace due to the first plain read, ending
> with an undesired interrupt.
>
> You need at least one cmpxchg. Well, of course that stays racy by nature because
> between the cmpxchg() returning CONTEXT_KERNEL and the actual IPI raised and
> received, the remote CPU may have gone to userspace already. But still it limits
> a little the window.
>

I can make that a 'do {} while ()' instead to force at least one execution
of the cmpxchg().

This is only about reducing the race window, right? If we're executing this
just as the target CPU is about to enter userspace, we're going to be in
racy territory anyway. Regardless, I'm happy to do that change.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ