linux-kernel - Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180904233714.GJ4762@redhat.com>
Date:   Tue, 4 Sep 2018 19:37:14 -0400
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     "Schaufler, Casey" <casey.schaufler@...el.com>
Cc:     Jiri Kosina <jikos@...nel.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Woodhouse, David" <dwmw@...zon.co.uk>,
        Oleg Nesterov <oleg@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
 be applied on arbitrary tasks

Hello,

On Tue, Sep 04, 2018 at 06:10:47PM +0000, Schaufler, Casey wrote:
> The real reason to use an LSM based approach is that overloading ptrace
> checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
> processor interface. Even if ptrace_may_access() does exactly what you

"Side channel is a processor interface" doesn't make me optimistic,
but I assume you're not speaking for Intel.

> want it to for side-channel mitigation today it would be incredibly
> inappropriate to tie the two together for eternity. You don't want to restrict
> the ptrace checks to only those things that are also good for side-channel,
> and vice versa. 

It seems like you want to make this more configurable, we have all
debugfs x86 specific tweaks to disable IBPB at runtime and we don't
allow a runtime opt-out of IBPB alone.

If you shutdown either IBRS or retpolines at runtime with debugfs,
then IBPB goes away too.

Giving a finegrined way to disable only IBPB we found was overkill
because IBPB has never been measurable if done only when the prev task
cannot ptrace the next task (which is a superset of the too weak
upstream not dumpable check, but still not a performance issue).

Allowing IBPB runtime opt-out doesn't make much sense if you don't
allow to disable retpolines too still at runtime. And disabling
retpolines from LSM doesn't sounds the right place, it's an x86
temporary quirk only relevant for current buggy CPUs.

There should be a function that decides when IBPB and flush_RSB should
be run (flush_RSB has to be run if SMEP because with SMEP there's no
need to run flush_RSB at every kernel entry anymore), and that
function happens to check ptrace today, but it would check other stuff
too if we had other interfaces besides ptrace that would allow the
prev task to read the memory of the next task. So it's not so much
about ptrace nor about IBPB, it's about "IBPB&flush_RSB" vs "prev task
can read memory of next task". Then each arch can implement
"IBPB&flush_RSB" method its own way but the check is for the common
code and it should be in the scheduler and there's just 1 version of
this check needed.

I don't think there should be a ton of different versions of this
function each providing a different answer, which is what LSM would
provide.

At most you can add a x86 debugfs tweak to shut off the logic but
again it would make more sense if retpolines could be shut off at
runtime too, doing it just for IBPB sounds backwards because it has
such an unmeasurable overhead.

> Yes. That would be me. 

Even the attempt to make an innocuous call like
ptrace_has_cap(tcred->user_ns, mode) will eventually
deadlock there.

I used a PTRACE_MODE_ check that the ptrace check code uses to filter
out specific calls that may eventually enter LSM and hard lockup in
non reproducible workloads (some false positive IBPB is ok, especially
if it avoids a deadlock).

Everything can be fixed in any way, but calling LSM from scheduler
code doesn't sound the most robust thing to do in general because what
works outside the scheduler doesn't work from within those semi atomic
regions (it tends to work initially until it eventually
deadlocks). The original code of such check, had all sort of deadlocks
because of that.

Thanks,
Andrea