linux-kernel - Re: [RFC PATCH v9 for 4.15 01/14] Restartable sequences system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <135399003.40850.1507930608890.JavaMail.zimbra@efficios.com>
Date:   Fri, 13 Oct 2017 21:36:48 +0000 (UTC)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Ben Maurer <bmaurer@...com>,
        David Goldblatt <davidgoldblatt@...com>,
        Qi Wang <qiwang@...com>, Boqun Feng <boqun.feng@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
        Andy Lutomirski <luto@...capital.net>,
        Dave Watson <davejwatson@...com>,
        Josh Triplett <josh@...htriplett.org>,
        Will Deacon <will.deacon@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, rostedt <rostedt@...dmis.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-api <linux-api@...r.kernel.org>,
        Carlos O'Donell <carlos@...hat.com>
Subject: Re: [RFC PATCH v9 for 4.15 01/14] Restartable sequences system call

----- On Oct 13, 2017, at 5:05 PM, Linus Torvalds torvalds@...ux-foundation.org wrote:

> On Fri, Oct 13, 2017 at 1:54 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
>>>
>>> And if nobody can be bothered to write the user-level code and test
>>> this patch-series, then obviously it's not important enough for the
>>> kernel to merge it.
>>
>> My guess is that it will take some time, probably measured in months,
>> to carry out this level of integration and testing to.
> 
> That would be an argument if this was a new patch series. "Wait a few months".
> 
> But that just isn't the case here.
> 
> The fact is, these patches have been floating around in one form or
> another not for a couple of months, but for years. There's a LWN
> article about it from 2015, and it wasn't new back then either (slides
> from 2013).
> 
> I wouldn't be surprised if there had been academic _papers_ written
> about the notion.
> 
> So if there  *still* is no actual real code around this, then that
> just strengthens my point - no way should we merge something where
> people haven't actually bothered to write the user-mode component for
> years and years.
> 
> It really boils down to: "if nobody can be bothered to write the user
> mode parts after several years, why should it be merged into the
> kernel"?
> 
> I don't think that's too much to ask for.

I remember hearing that Google have been running their own version of
this patch on their servers. My understanding is that they did not
really care about things like single-stepping on server workloads,
because they never single-step. One issue there is that getting
rseq to work for specifically tuned systems (e.g. no cpu hotplug,
no single-stepping, and so on) is fairly straightforward. The tricky
part is to make it work in all situations, and I don't think Google
had incentive to complete those tricky bits, because they don't need
them.

Facebook seem to try to follow upstream more closely. My understanding
is that they can do specific prototypes to prove the value of the
approach, as they did with their jemalloc integration from last year.

I have myself integrated the LTTng-UST tracer to rseq as a prototype
branch, and created a Userspace RCU prototype branch that works
similarly to SRCU in the kernel, using per-cpu counters. Those are
staying prototypes because I won't release an open source tool
or library based on non-mainline system call numbers, this just cannot
end well. Once/if rseq gets in, my next step will be to implement a
multi-process userspace RCU flavor based on per-cpu counters (with
rseq). Doing this with atomic operations is not worth it, because it
just leads to really poor performance for read-side.

I also spoke to Carlos O'Donell from glibc about it, and he was very
excited about the possible use of rseq for malloc speedup/memory usage
improvement. But again, I don't see a project like glibc starting to
use a system call for which the number will have to be bumped every
now and then.

I would *not* want this merged before we gather significant user feedback.
The question is: how can we best gather that feedback ?

Perhaps one approach could be to reserve system call numbers for
sys_rseq and sys_cpu_opv, but leave them unimplemented for now
(ENOSYS). This would lessen the amount of pain user-space would have
to go through to adapt to system call number changes, and we could
provide the implementation of those system calls in a -rseq tree, which
I'd be happy to maintain in order to gather feedback. If it ends up that
it's not the right approach after all, all we would have lost is two
unwired system call numbers per architecture.

Thoughts ?

Thanks,

Mathieu


> 
>                 Linus

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com