[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <05f17f8b-4cbb-4d17-81f3-ada2ac12ce6b@linux.ibm.com>
Date: Sat, 17 Jan 2026 16:29:26 +0530
From: Hari Bathini <hbathini@...ux.ibm.com>
To: adubey <adubey@...p.linux.ibm.com>
Cc: adubey@...ux.ibm.com, bpf@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
sachinpb@...ux.ibm.com, venkat88@...ux.ibm.com, andrii@...nel.org,
eddyz87@...il.com, mykolal@...com, ast@...nel.org,
daniel@...earbox.net, martin.lau@...ux.dev, song@...nel.org,
yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org,
christophe.leroy@...roup.eu, naveen@...nel.org, maddy@...ux.ibm.com,
mpe@...erman.id.au, npiggin@...il.com, memxor@...il.com,
iii@...ux.ibm.com, shuah@...nel.org
Subject: Re: [PATCH v2 1/6] powerpc64/bpf: Move tail_call_cnt to bottom of
stack frame
On 17/01/26 4:11 pm, adubey wrote:
> On 2026-01-17 15:41, Hari Bathini wrote:
>> On 14/01/26 5:14 pm, adubey@...ux.ibm.com wrote:
>>> From: Abhishek Dubey <adubey@...ux.ibm.com>
>>>
>>> In the conventional stack frame, the position of tail_call_cnt
>>> is after the NVR save area (BPF_PPC_STACK_SAVE). Whereas, the
>>> offset of tail_call_cnt in the trampoline frame is after the
>>> stack alignment padding. BPF JIT logic could become complex
>>> when dealing with frame-sensitive offset calculation of
>>> tail_call_cnt. Having the same offset in both frames is the
>>> desired objective.
>>>
>>> The trampoline frame does not have a BPF_PPC_STACK_SAVE area.
>>> Introducing it leads to under-utilization of extra memory meant
>>> only for the offset alignment of tail_call_cnt.
>>> Another challenge is the variable alignment padding sitting at
>>> the bottom of the trampoline frame, which requires additional
>>> handling to compute tail_call_cnt offset.
>>>
>>> This patch addresses the above issues by moving tail_call_cnt
>>> to the bottom of the stack frame at offset 0 for both types
>>> of frames. This saves additional bytes required by BPF_PPC_STACK_SAVE
>>> in trampoline frame, and a common offset computation for
>>> tail_call_cnt serves both frames.
>>>
>>> The changes in this patch are required by the third patch in the
>>> series, where the 'reference to tail_call_info' of the main frame
>>> is copied into the trampoline frame from the previous frame.
>>>
>>> Signed-off-by: Abhishek Dubey <adubey@...ux.ibm.com>
>>> ---
>>> arch/powerpc/net/bpf_jit.h | 4 ++++
>>> arch/powerpc/net/bpf_jit_comp64.c | 31 ++++++++++++++++++++-----------
>>> 2 files changed, 24 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
>>> index 8334cd667bba..45d419c0ee73 100644
>>> --- a/arch/powerpc/net/bpf_jit.h
>>> +++ b/arch/powerpc/net/bpf_jit.h
>>> @@ -72,6 +72,10 @@
>>> } } while (0)
>>> #ifdef CONFIG_PPC64
>>> +
>>> +/* for tailcall counter */
>>> +#define BPF_PPC_TAILCALL 8
>>> +
>>> /* If dummy pass (!image), account for maximum possible
>>> instructions */
>>> #define PPC_LI64(d, i) do { \
>>> if (!image) \
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/
>>> bpf_jit_comp64.c
>>> index 1fe37128c876..39061cd742c1 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -20,13 +20,15 @@
>>> #include "bpf_jit.h"
>>> /*
>>> - * Stack layout:
>>> + * Stack layout 1:
>>> + * Layout when setting up our own stack frame.
>>> + * Note: r1 at bottom, component offsets positive wrt r1.
>>> * Ensure the top half (upto local_tmp_var) stays consistent
>>> * with our redzone usage.
>>> *
>>> * [ prev sp ] <-------------
>>> - * [ nv gpr save area ] 6*8 |
>>> * [ tail_call_cnt ] 8 |
>>> + * [ nv gpr save area ] 6*8 |
>>> * [ local_tmp_var ] 24 |
>>> * fp (r31) --> [ ebpf stack space ] upto 512 |
>>> * [ frame header ] 32/112 |
>>> @@ -36,10 +38,12 @@
>>> /* for gpr non volatile registers BPG_REG_6 to 10 */
>>> #define BPF_PPC_STACK_SAVE (6*8)
>>> /* for bpf JIT code internal usage */
>>> -#define BPF_PPC_STACK_LOCALS 32
>>> +#define BPF_PPC_STACK_LOCALS 24
>>> /* stack frame excluding BPF stack, ensure this is quadword aligned */
>>> #define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
>>> - BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
>>> + BPF_PPC_STACK_LOCALS + \
>>> + BPF_PPC_STACK_SAVE + \
>>> + BPF_PPC_TAILCALL)
>>> /* BPF register usage */
>>> #define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
>>> @@ -87,27 +91,32 @@ static inline bool bpf_has_stack_frame(struct
>>> codegen_context *ctx)
>>> }
>>>
>>
>>> /*
>>> + * Stack layout 2:
>>> * When not setting up our own stackframe, the redzone (288 bytes)
>>> usage is:
>>> + * Note: r1 from prev frame. Component offset negative wrt r1.
>>> *
>>> * [ prev sp ] <-------------
>>> * [ ... ] |
>>> * sp (r1) ---> [ stack pointer ] --------------
>>> - * [ nv gpr save area ] 6*8
>>> * [ tail_call_cnt ] 8
>>> + * [ nv gpr save area ] 6*8
>>> * [ local_tmp_var ] 24
>>> * [ unused red zone ] 224
>>> */
>>
>> Calling it stack layout 1 & 2 is inappropriate. The stack layout
>> is essentially the same. It just goes to show things with reference
>> to r1 when stack is setup explicitly vs when redzone is being used...
> Agree. I am using it as labels to refer in comment. Any better suggestions?
I think the comments could refer to has stack frame vs Redzone case..
- Hari
Powered by blists - more mailing lists