[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mb61pjzhmpqff.fsf@kernel.org>
Date: Mon, 15 Jul 2024 16:31:48 +0000
From: Puranjay Mohan <puranjay@...nel.org>
To: Daniel Borkmann <daniel@...earbox.net>, Manu Bretelle
<chantra@...a.com>, KP Singh <kpsingh@...nel.org>
Cc: Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman
<eddyz87@...il.com>, Mykola Lysenko <mykolal@...a.com>, Alexei Starovoitov
<ast@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu
<song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
<john.fastabend@...il.com>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo
<haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Shuah Khan
<shuah@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Florent
Revest <revest@...gle.com>
Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
Hi Daniel, Manu
I was able to reproduce this issue on KVM and found the root cause for
this hang! The other issue that we fixed is unrelated to this hang and
doesn't occur on self hosted github runners as they use 48-bit VAs.
The userspace test code has:
#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE];
cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);
arm64 requires the stack pointer to be 16 byte aligned otherwise
SPAlignmentFault occurs, this appears as Bus error in the userspace.
The stack provided to the clone system call is not guaranteed to be
aligned properly in this selftest.
The test hangs on the following line:
while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
Because the child process is killed due to SPAlignmentFault, the
fentry_cnt remains at 0!
Reading the man page of clone system call, the correct way to allocate
stack for this call is using mmap like this:
stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
This fixes the issue, I will send a patch to use this and once again
remove this test from DENYLIST and I hope this time it fixes it for good.
> It looks like there is still an issue left. A recent CI run on bpf-next is
> still hitting the same on arm64:
>
> Base:
>
> https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/
>
> CI:
>
> https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436
>
> [...]
> #89/11 fexit_bpf2bpf/func_replace_global_func:OK
> #89/12 fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
> #89/13 fexit_bpf2bpf/func_replace_progmap:OK
> #89 fexit_bpf2bpf:OK
> Error: The operation was canceled.
Thanks,
Puranjay
Download attachment "signature.asc" of type "application/pgp-signature" (256 bytes)
Powered by blists - more mailing lists