linux-kernel - Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2130360737.21248.1511471602798.JavaMail.zimbra@efficios.com>
Date:   Thu, 23 Nov 2017 21:13:22 +0000 (UTC)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Andi Kleen <andi@...stfloor.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        Dave Watson <davejwatson@...com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andrew Hunter <ahh@...gle.com>,
        Chris Lameter <cl@...ux.com>, Ben Maurer <bmaurer@...com>,
        rostedt <rostedt@...dmis.org>,
        Josh Triplett <josh@...htriplett.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op
 vector

----- On Nov 22, 2017, at 2:32 PM, Peter Zijlstra peterz@...radead.org wrote:

> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
>> Other than that, I have not received any concrete alternative proposal to
>> properly handle single-stepping.
> 
> That's not entirely true; amluto did have an alternative in Prague: do
> full machine level instruction emulation till the end of the rseq when
> it gets 'preempted too often'.

Yes, that's right. Andy did propose that alternative at KS. Which is also
interpreter-based.

> 
> Yes, implementing that will be an absolute royal pain. But it does
> remove the whole duplicate/dual program asm/bytecode thing and avoids
> the syscall entirely.

Agreed on this being a royal pain that we'd have to do for each
architecture.

By the way, I figured an interesting library API that would remove the
need for code duplication for end-users:

e.g.

static inline __attribute__((always_inline))
int percpu_addv(intptr_t *v, intptr_t count, int cpu)
{
        rseq_addv(v, count, cpu);
        if (rseq_unlikely(rseq_addv(v, count, cpu)))
                return cpu_op_addv(v, count, cpu);
        return 0;
}

And the caller becomes:

                cpu = rseq_cpu_start();
                ret = percpu_addv(&data->c[cpu].count, 1, cpu);
                if (unlikely(ret)) {
                        perror("cpu_opv");
                        abort();
                }

So the caller does not even have to bother retrying in case of
rseq error, it's all handled by the "percpu_*()" static inlines.

> 
> And we don't need to do a full x86_64/arch-of-choice emulator for this
> either; just as cpu_opv is fairly limited too. We can do a subset that
> allows dealing with the known sequences and go from there -- it can
> always fall back to not emulating and reverting to the pure rseq with
> debug/fwd progress 'issues'.

I think trying to make the kernel ABI "developer-friendly" is the wrong
approach. This kind of ease-of-use sugar should be provided by a library,
not by the kernel ABI. The futex system call is a good example of low-level
syscall meant to be used by libraries rather than directly by end-users.

> So what exactly is the problem of leaving out the whole cpu_opv thing
> for now? Pure rseq is usable -- albeit a bit cumbersome without
> additional debugger support.

Then rseq will cover _some_ use-cases, but will miss many others.

One example is the reserve+commit lttng-ust ring buffer operations, where
the commit _needs_ to run on the same CPU as the reserve. Just rseq does
not allow a tracer library to do that, rseq+cpu_opv allow that just
fine.

So if I introduce just rseq for now, then all those other use-cases will
need to check whether the kernel supports cpu_opv or not as well, cache the
result into a variable, and it will add a forest of branches into those
fast paths. No thanks.

Also, turning both line-level and instruction-level single-stepping into
infinite loops looks pretty much like a new kernel facility that breaks
user-space. It's a no-go from my point of view.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com