linux-kernel - Re: [RFC PATCH v8 1/9] Restartable sequences system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <91715400.22162.1472483812389.JavaMail.zimbra@efficios.com>
Date:   Mon, 29 Aug 2016 15:16:52 +0000 (UTC)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Josh Triplett <josh@...htriplett.org>
Cc:     Ben Maurer <bmaurer@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dave Watson <davejwatson@...com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andrew Hunter <ahh@...gle.com>,
        Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
        rostedt <rostedt@...dmis.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH v8 1/9] Restartable sequences system call

----- On Aug 27, 2016, at 12:22 AM, Josh Triplett josh@...htriplett.org wrote:

> On Thu, Aug 25, 2016 at 05:56:25PM +0000, Ben Maurer wrote:
>> rseq opens up a whole world of algorithms to userspace – algorithms
>> that are O(num CPUs) and where one can have an extremely fast fastpath
>> at the cost of a slower slow path. Many of these algorithms are in use
>> in the kernel today – per-cpu allocators, RCU, light-weight reader
>> writer locks, etc. Even in cases where these APIs can be implemented
>> today, a rseq implementation is often superior in terms of
>> predictability and usability (eg per-thread counters consume more
>> memory and are more expensive to read than per-cpu counters).
>>
>> Isn’t the large number of uses of rseq-like algorithms in the kernel a
>> pretty substantial sign that there would be demand for similar
>> algorithms by user-space systems programmers?
> 
> Yes and no.  It provides a substantial sign that such algorithms could
> and should exist; however "someone should do this" doesn't demonstrate
> that someone *will*.  I do think we need a concrete example of a
> userspace user with benchmark numbers that demonstrate the value of this
> approach.
> 
> Mathieu, do you have a version of URCU that can use rseq to work per-CPU
> rather than per-thread?  URCU's data structures would work as a
> benchmark.

I currently don't have a per-cpu flavor of liburcu. All the flavors are
per-thread, because currently the alternative requires atomic operations
on the fast-path. We could indeed re-implement something similar to SRCU
(although under LGPLv2.1 license). I've looked at what would be required
over the weekend, and it seems feasible, but in the short term my customers
expect me to focus my work on speeding up the LTTng-UST tracer per-cpu
ring buffer by adapting it to rseq. Completing the liburcu per-cpu flavor
will be in my spare time for now.

I expect liburcu per-cpu flavor to improve the slow path in many-threads
use-cases (smaller grace period overhead), but not the fast path much,
except perhaps by allowing faster memory reclaim in update-heavy workloads,
which could then lead to better use of the cache even for reads.

> 
> Ben, Mathieu, Dave, do you have jemalloc benchmark numbers with and
> without rseq?  (As well as memory usage numbers for the reduced memory
> usage of per-CPU pools rather than per-thread pools?)

Before I started reimplementing rseq, the numbers presented by Facebook
at https://lkml.org/lkml/2015/10/22/588 were in my opinion a good proof
that rseq is useful. I'm not sure if their memoryidler API was used back
then.

I could take Dave's jemalloc branch adapted to Paul Turner's rseq and
adapt it to mine. Then we could use this allocator to compare the
memory use and speed of heavily multi-threaded applications.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com