linux-kernel - Re: [RFC PATCH for 4.17 10/21] cpu_opv: Provide cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1109208604.169.1522259681295.JavaMail.zimbra@efficios.com>
Date:   Wed, 28 Mar 2018 13:54:41 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        Dave Watson <davejwatson@...com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andrew Hunter <ahh@...gle.com>,
        Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
        Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
        Josh Triplett <josh@...htriplett.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH for 4.17 10/21] cpu_opv: Provide cpu_opv system call
 (v6)

----- On Mar 28, 2018, at 11:22 AM, Peter Zijlstra peterz@...radead.org wrote:

> On Tue, Mar 27, 2018 at 12:05:31PM -0400, Mathieu Desnoyers wrote:
> 
>> 1) Allow algorithms to perform per-cpu data migration without relying on
>>    sched_setaffinity()
>> 
>> The use-cases are migrating memory between per-cpu memory free-lists, or
>> stealing tasks from other per-cpu work queues: each require that
>> accesses to remote per-cpu data structures are performed.
> 
> I think that one completely reduces to the per-cpu (spin)lock case,
> right? Because, as per the below, your logging case (8) can 'easily' be
> done without the cpu_opv monstrosity.
> 
> And if you can construct a per-cpu lock, that can be used to construct
> aribtrary logic.

The per-cpu spinlock does not have the same performance characteristics
as lock-free alternatives for various operations. A rseq compare-and-store
is faster than a rseq spinlock for linked-list operations.

> 
> And the difficult case for the per-cpu lock is the remote acquire; all
> the other cases are (relatively) trivial.
> 
> I've not really managed to get anything sensible to work, I've tried
> several variations of split lock, but you invariably end up with
> barriers in the fast (local) path, which sucks.
> 
> But I feel this should be solvable without cpu_opv. As in, I really hate
> that thing ;-)

I have not developed cpu_opv out of any kind of love for that solution.
I just realized that it did solve all my issues after failing for quite
some time to implement acceptable solutions for the remote access
problem, and for ensuring progress of single-stepping with current
debuggers that don't know about the rseq_table section.

> 
>> 8) Allow libraries with multi-part algorithms to work on same per-cpu
>>    data without affecting the allowed cpu mask
>> 
>> The lttng-ust tracer presents an interesting use-case for per-cpu
>> buffers: the algorithm needs to update a "reserve" counter, serialize
>> data into the buffer, and then update a "commit" counter _on the same
>> per-cpu buffer_. Using rseq for both reserve and commit can bring
>> significant performance benefits.
>> 
>> Clearly, if rseq reserve fails, the algorithm can retry on a different
>> per-cpu buffer. However, it's not that easy for the commit. It needs to
>> be performed on the same per-cpu buffer as the reserve.
>> 
>> The cpu_opv system call solves that problem by receiving the cpu number
>> on which the operation needs to be performed as argument. It can push
>> the task to the right CPU if needed, and perform the operations there
>> with preemption disabled.
>> 
>> Changing the allowed cpu mask for the current thread is not an
>> acceptable alternative for a tracing library, because the application
>> being traced does not expect that mask to be changed by libraries.
> 
> We talked about this use-case, and it can be solved without cpu_opv if
> you keep a dual commit counter, one local and one (atomic) remote.

Right.

> 
> We retain the cpu_id from the first rseq, and the second part will, when
> it (unlikely) finds it runs remotely, do an atomic increment on the
> remote counter. The consumer of the counter will then have to sum both
> the local and remote counter parts.

Yes, I did a prototype of this specific case with split-counters a while
ago. However, if we need cpu_opv as fallback for other reasons (e.g. remote
accesses), then the split-counters are not needed, and there is no need to
change the layout of user-space data to accommodate the extra per-cpu
counter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com