linux-kernel - Re: [PATCH v2 0/4] Static calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 12 Dec 2018 18:14:00 +0000
From:   Nadav Amit <namit@...are.com>
To:     Edward Cree <ecree@...arflare.com>
CC:     Josh Poimboeuf <jpoimboe@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH v2 0/4] Static calls

> On Dec 12, 2018, at 9:11 AM, Edward Cree <ecree@...arflare.com> wrote:
> 
> On 12/12/18 05:59, Nadav Amit wrote:
>> Thanks for cc’ing me. (I didn’t know about the other patch-sets.)
> Well in my case, that's because I haven't posted any yet.  (Will follow up
>  shortly with what I currently have, though it's not pretty.)
> 
> Looking at your patches, it seems you've got a much more developed learning
>  mechanism.  Mine on the other hand is brutally simple but runs continuously
>  (i.e. after we patch we immediately enter the next 'relearning' phase);
>  since it never does anything but prod a handful of percpu variables, this
>  shouldn't be too costly.
> 
> Also, you've got the macrology for making all indirect calls use this,
>  whereas at present I just have an open-coded instance on a single call site
>  (I went with deliver_skb in the networking stack).
> 
> So I think where we probably want to go from here is:
>  1) get Josh's static_calls in.  AIUI Linus seems to prefer the out-of-line
>     approach; I'd say ditch the inline version (at least for now).
>  2) build a relpolines patch series that uses
>    i) static_calls for the text-patching part
>   ii) as much of Nadav's macrology as is applicable
>  iii) either my or Nadav's learning mechanism; we can experiment with both,
>       bikeshed it incessantly etc.
> 
> Seem reasonable?

Mostly yes. I have a few reservations (and let’s call them optpolines from
now on, since Josh disliked the previous name).

First, I still have to address the issues that Josh raised before, and try
to use gcc plugin instead of (most) of the macros. Specifically, I need to
bring back (from my PoC code) the part that sets multiple targets.

Second, (2i) is not very intuitive for me. Using the out-of-line static
calls seems to me as less performant than the inline (potentially, I didn’t
check).

Anyhow, the use of out-of-line static calls seems to me as
counter-intuitive. I think (didn’t measure) that it may add more overhead
than it saves due to the additional call, ret, and so on - at least if
retpolines are not used. For multiple targets it may be useful in saving
some memory if the outline block is dynamically allocated (as I did in my
yet unpublished code). But that’s not how it’s done in Josh’s code.

If we talk about inline implementation there is a different problem that
prevents me of using Josh’s static-calls as-is. I tried to avoid reading to
compared target from memory and therefore used an immediate. This should
prevent data cache misses and even when the data is available is faster by
one cycle. But it requires the patching of both the “cmp %target-reg, imm”
and “call rel-target” to be patched “atomically”. So the static-calls
mechanism wouldn’t be sufficient.

Based on Josh’s previous feedback, I thought of improving the learning using
some hysteresis. Anyhow, note that there are quite a few cases in which you
wouldn’t want optpolines. The question is whether in general it would be an
opt-in or opt-out mechanism.

Let me know what you think.

BTW: When it comes to deliver_skb, you have packet_type as an identifier.
You can use it directly or through an indirection table to figure the
target. Here’s a chunk of assembly magic that I used in a similar case:

.macro _call_table val:req bit:req max:req val1:req bit1:req
call_table_\val\()_\bit\():
        test $(1 << \bit), %al
.if \val1 + (1 << \bit1) >= \max
        jnz syscall_relpoline_\val1
        jmp syscall_relpoline_\val
.else
        jnz call_table_\val1\()_\bit1

        # fall-through to no carry, val unchange, going to next bit
        call_table \val,\bit1,\max
        call_table \val1,\bit1,\max
.endif
.endm

.macro call_table val:req bit:req max:req
.altmacro
        _call_table \val,\bit,\max,%(\val + (1 << \bit)),%(\bit + 1)
.noaltmacro
.endm

ENTRY(direct_syscall)
        mov %esi, %eax
        call_table val=0 bit=0 max=16
ENDPROC(direct_syscall)