lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 11 Feb 2022 05:43:07 -0800 From: Dan Li <ashimida@...ux.alibaba.com> To: gcc-patches@....gnu.org, richard.earnshaw@....com, marcus.shawcroft@....com, kyrylo.tkachov@....com, hp@....gnu.org, ndesaulniers@...gle.com, nsz@....gnu.org, pageexec@...il.com, qinzhao@....gnu.org, linux-hardening@...r.kernel.org, richard.sandiford@....com Subject: Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack On 2/11/22 01:53, Richard Sandiford wrote: > Dan Li <ashimida@...ux.alibaba.com> writes: >> On 2/10/22 01:55, Richard Sandiford wrote: >>>> >>> But treating scs push and scs pop as part of the register save and >>> restore sequences would have one advantage: it would allow the >>> scs push and scs pop to be shrink-wrapped. >>> >> >> Sorry for my limited knowledge of shrink warping, I don't think I get >> it here (I tried to find a case when compiling the kernel and some >> gcc test cases but I still don't have a clue.). >> >> I see that the bitmap of LR_REGNUM is cleared in >> aarch64_get_separate_components and scs push/pop are x18 based operations. >> >> If we handle them in aarch64_restore/save_callee_saves, >> could scs push/pop be shrink-wrapped in some cases? > > Yeah, I think so. E.g. for: > > void f(); > int g(int x) { > if (x == 0) > return 1; > f(); > return 2; > } > > shrink wrapping would allow the scs push and pop to move along with the > x30 save: > > g: > cbnz w0, .L9 > mov w0, 1 > ret > .L9: > stp x29, x30, [sp, -16]! > mov x29, sp > bl f > mov w0, 2 > ldp x29, x30, [sp], 16 > ret > Thanks Richard, (to make sure I understand correctly :)) I think it means that the current patch could do a "shrink-wapping", but the X30 could not be treat as a "component", now it could gen code like: g: cbnz w0, .L9 mov w0, 1 ret .L9: str x30, [x18], 8 stp x29, x30, [sp, -16]! mov x29, sp bl f ldr x30, [x18, -8]! mov w0, 2 ldr x29, [sp], 16 ret > The idea is that aarch64_save_callee_saves would treat the scs push > as part of saving x30 (along with the normal store to the frame chain, > when used). aarch64_restore_callee_saves would similarly treat the scs > pop as the way of restoring x30 (instead of loading from the frame chain). > This is in contrast to the current patch, where the scs push and pop are > treated as fixed parts of the prologue and epilogue instead, and where > aarch64_restore_callee_saves tries to avoid doing anything for x30. > > If shrink-wrapping decides to treat x30 as a separate “component”, as it > does in the example above, then the scs push and pop would be emitted > by aarch64_process_components instead. > > It would be more complex, but it would give better code. > Following your idea, I made a poc to add x30 in component bitmap: diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 35f6f64f5b2..fc9b5e7af54 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8359,7 +8359,7 @@ aarch64_get_separate_components (void) if (reg1 != INVALID_REGNUM) bitmap_clear_bit (components, reg1); - bitmap_clear_bit (components, LR_REGNUM); bitmap_clear_bit (components, SP_REGNUM); return components; @@ -8396,7 +8396,7 @@ aarch64_components_for_bb (basic_block bb) /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) if (!fixed_regs[regno] - && !crtl->abi->clobbers_full_reg_p (regno) + && (!crtl->abi->clobbers_full_reg_p (regno) || regno == R30_REGNUM) && (TEST_HARD_REG_BIT (extra_caller_saves, regno) || bitmap_bit_p (in, regno) || bitmap_bit_p (gen, regno) And with a test code compiled with -fno-omit-frame-pointer: void f(); int g(int x) { if (x == 0) { __asm__ ("":::"x19", "x20"); return 1; } f(); return 2; } Then it seems X30 is treat as a "component" (the test result of aarch64.exp also seems fine). g: stp x19, x20, [sp, -32]! cbnz w0, .L2 mov w0, 1 ldp x19, x20, [sp], 32 ret .L2: str x30, [sp, 16] bl f ldr x30, [sp, 16] mov w0, 2 ldp x19, x20, [sp], 32 ret And I think maybe we could handle this through three patches: 1.Keep current patch (a V5) unchanged for scs. 2.Add shrink-warpping for X30: logically this might be a separate topic, and I think more testing might be needed here (Well, I'm a little worried about if there might be other effects, since I just read this part of the code roughly yesterday). 3.Add scs push/pop to shrink-wrapping (and maybe we can do the same for the PAC code in pro/epilogue, since it's also the operation of the X30). Thanks, Dan
Powered by blists - more mailing lists