[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mptv8xl20hr.fsf@arm.com>
Date: Fri, 11 Feb 2022 09:53:04 +0000
From: Richard Sandiford <richard.sandiford@....com>
To: Dan Li <ashimida@...ux.alibaba.com>
Cc: gcc-patches@....gnu.org, richard.earnshaw@....com,
marcus.shawcroft@....com, kyrylo.tkachov@....com, hp@....gnu.org,
ndesaulniers@...gle.com, nsz@....gnu.org, pageexec@...il.com,
qinzhao@....gnu.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH] [PATCH,v4,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack
Dan Li <ashimida@...ux.alibaba.com> writes:
> On 2/10/22 01:55, Richard Sandiford wrote:
>>>
>>> There might be a little difference:
>>>
>>> - Using push candidates means that a register to be ignored in pop
>>> candidates will not be emitted again during the "restore" (pop_candidates
>>> should always be a subset of push_candidates, since popping a register
>>> without a push might not make sense).
>>
>> The push candidates are simply a subset of the saved registers though.
>> Similarly, the pop candidates are simply a subset of the restored registers.
>> So I think the requirement operates at that level: the restored registers
>> must be a subset of the saved registers.
>>
>> In other circumstances it could have been the other way around:
>> there might have been a change that stopped us from saving two
>> registers during the allocation, but we wanted to carry on restoring
>> two registers during the deallocation. I don't think there's a
>> reason that the push candidates *have* to be a superset of the
>> pop candidates (even though they are with the current change).
>>
>
> Oh yeah, that sounds more reasonable.
>
>>> When we use "pop_candidate[12]", one more insn is emitted:
>>>
>>> 0000000000400604 <main>:
>>> 400604: a9bf53f3 stp x19, x20, [sp, #-16]!
>>> 400608: 52800000 mov w0, #0x0
>>> + 40060c: f94007f4 ldr x20, [sp, #8]
>>> 400610: f84107f3 ldr x19, [sp], #16
>>> 400614: d65f03c0 ret
>>>
>>> But in the case of ignoring a specific register (like scs ignores x30),
>>> there is no difference between the two (because we always need
>>> to explicitly specify which registers to ignore in the parameter of
>>> aarch64_restore_callee_saves).
>>
>> I think this is the correct behaviour. If we don't want to restore
>> a register at all then it should be excluded from the restore list
>> somehow. In your case you're doing that be using a limit of
>> X29_REGNUM instead of X30_REGNUM.
>>
>
> Got it, I'll use pop candidates in the next version.
>
>> FWIW, I did wonder whether aarch64_restore_callee_saves should be
>> doing the scs pop, rather than aarch64_expand_epilogue, and in an
>> earlier draft of the previous review I'd asked for that. It does
>> seem conceptually cleaner, but in practice, it would probably have
>> been awkward to implement. E.g. we'd need to explicitly stop an
>> LDP being formed with X30 as the second register.
>>
>
> Well, then I think I should keep it the same here :).
>
>> But treating scs push and scs pop as part of the register save and
>> restore sequences would have one advantage: it would allow the
>> scs push and scs pop to be shrink-wrapped.
>>
>
> Sorry for my limited knowledge of shrink warping, I don't think I get
> it here (I tried to find a case when compiling the kernel and some
> gcc test cases but I still don't have a clue.).
>
> I see that the bitmap of LR_REGNUM is cleared in
> aarch64_get_separate_components and scs push/pop are x18 based operations.
>
> If we handle them in aarch64_restore/save_callee_saves,
> could scs push/pop be shrink-wrapped in some cases?
Yeah, I think so. E.g. for:
void f();
int g(int x) {
if (x == 0)
return 1;
f();
return 2;
}
shrink wrapping would allow the scs push and pop to move along with the
x30 save:
g:
cbnz w0, .L9
mov w0, 1
ret
.L9:
stp x29, x30, [sp, -16]!
mov x29, sp
bl f
mov w0, 2
ldp x29, x30, [sp], 16
ret
The idea is that aarch64_save_callee_saves would treat the scs push
as part of saving x30 (along with the normal store to the frame chain,
when used). aarch64_restore_callee_saves would similarly treat the scs
pop as the way of restoring x30 (instead of loading from the frame chain).
This is in contrast to the current patch, where the scs push and pop are
treated as fixed parts of the prologue and epilogue instead, and where
aarch64_restore_callee_saves tries to avoid doing anything for x30.
If shrink-wrapping decides to treat x30 as a separate “component”, as it
does in the example above, then the scs push and pop would be emitted
by aarch64_process_components instead.
It would be more complex, but it would give better code.
Thanks,
Richard
Powered by blists - more mailing lists