lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Feb 2022 05:43:07 -0800
From:   Dan Li <ashimida@...ux.alibaba.com>
To:     gcc-patches@....gnu.org, richard.earnshaw@....com,
        marcus.shawcroft@....com, kyrylo.tkachov@....com, hp@....gnu.org,
        ndesaulniers@...gle.com, nsz@....gnu.org, pageexec@...il.com,
        qinzhao@....gnu.org, linux-hardening@...r.kernel.org,
        richard.sandiford@....com
Subject: Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler
 support for Shadow Call Stack



On 2/11/22 01:53, Richard Sandiford wrote:
> Dan Li <ashimida@...ux.alibaba.com> writes:
>> On 2/10/22 01:55, Richard Sandiford wrote:
>>>>
>>> But treating scs push and scs pop as part of the register save and
>>> restore sequences would have one advantage: it would allow the
>>> scs push and scs pop to be shrink-wrapped.
>>>
>>
>> Sorry for my limited knowledge of shrink warping, I don't think I get
>> it here (I tried to find a case when compiling the kernel and some
>> gcc test cases but I still don't have a clue.).
>>
>> I see that the bitmap of LR_REGNUM is cleared in
>> aarch64_get_separate_components and scs push/pop are x18 based operations.
>>
>> If we handle them in aarch64_restore/save_callee_saves,
>> could scs push/pop be shrink-wrapped in some cases?
> 
> Yeah, I think so.  E.g. for:
> 
> void f();
> int g(int x) {
>      if (x == 0)
>          return 1;
>      f();
>      return 2;
> }
> 
> shrink wrapping would allow the scs push and pop to move along with the
> x30 save:
> 
> g:
>          cbnz    w0, .L9
>          mov     w0, 1
>          ret
> .L9:
>          stp     x29, x30, [sp, -16]!
>          mov     x29, sp
>          bl      f
>          mov     w0, 2
>          ldp     x29, x30, [sp], 16
>          ret
> 

Thanks Richard, (to make sure I understand correctly :)) I think
it means that the current patch could do a "shrink-wapping", but
the X30 could not be treat as a "component", now it could gen code
like:

g:
         cbnz    w0, .L9
         mov     w0, 1
         ret
.L9:
         str     x30, [x18], 8
         stp     x29, x30, [sp, -16]!
         mov     x29, sp
         bl      f
         ldr     x30, [x18, -8]!
         mov     w0, 2
         ldr     x29, [sp], 16
         ret

> The idea is that aarch64_save_callee_saves would treat the scs push
> as part of saving x30 (along with the normal store to the frame chain,
> when used).  aarch64_restore_callee_saves would similarly treat the scs
> pop as the way of restoring x30 (instead of loading from the frame chain).
> This is in contrast to the current patch, where the scs push and pop are
> treated as fixed parts of the prologue and epilogue instead, and where
> aarch64_restore_callee_saves tries to avoid doing anything for x30.
> 
> If shrink-wrapping decides to treat x30 as a separate “component”, as it
> does in the example above, then the scs push and pop would be emitted
> by aarch64_process_components instead.
> 
> It would be more complex, but it would give better code.
> 

Following your idea, I made a poc to add x30 in component bitmap:

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 35f6f64f5b2..fc9b5e7af54 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void)
    if (reg1 != INVALID_REGNUM)
      bitmap_clear_bit (components, reg1);
  
-  bitmap_clear_bit (components, LR_REGNUM);
    bitmap_clear_bit (components, SP_REGNUM);
  
    return components;
@@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb)
    /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
    for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
      if (!fixed_regs[regno]
-       && !crtl->abi->clobbers_full_reg_p (regno)
+       && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM)
         && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
             || bitmap_bit_p (in, regno)
             || bitmap_bit_p (gen, regno)

And with a test code compiled with -fno-omit-frame-pointer:

void f();
int g(int x) {
     if (x == 0) {
         __asm__ ("":::"x19", "x20");
         return 1;
     }
     f();
     return 2;
}

Then it seems X30 is treat as a "component" (the test
result of aarch64.exp also seems fine).

g:
         stp     x19, x20, [sp, -32]!
         cbnz    w0, .L2
         mov     w0, 1
         ldp     x19, x20, [sp], 32
         ret
.L2:
         str     x30, [sp, 16]
         bl      f
         ldr     x30, [sp, 16]
         mov     w0, 2
         ldp     x19, x20, [sp], 32
         ret

And I think maybe we could handle this through three patches:
1.Keep current patch (a V5) unchanged for scs.
2.Add shrink-warpping for X30:
logically this might be a separate topic, and I think more testing
might be needed here (Well, I'm a little worried about if there might
be other effects, since I just read this part of the code roughly
yesterday).
3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for
the PAC code in pro/epilogue, since it's also the operation of the X30).

Thanks,
Dan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ