lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mb61pjzhmpqff.fsf@kernel.org>
Date: Mon, 15 Jul 2024 16:31:48 +0000
From: Puranjay Mohan <puranjay@...nel.org>
To: Daniel Borkmann <daniel@...earbox.net>, Manu Bretelle
 <chantra@...a.com>, KP Singh <kpsingh@...nel.org>
Cc: Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman
 <eddyz87@...il.com>, Mykola Lysenko <mykolal@...a.com>, Alexei Starovoitov
 <ast@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu
 <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
 <john.fastabend@...il.com>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo
 <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Shuah Khan
 <shuah@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
 "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Florent
 Revest <revest@...gle.com>
Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep


Hi Daniel, Manu
I was able to reproduce this issue on KVM and found the root cause for
this hang! The other issue that we fixed is unrelated to this hang and
doesn't occur on self hosted github runners as they use 48-bit VAs.

The userspace test code has:

    #define STACK_SIZE (1024 * 1024)
    static char child_stack[STACK_SIZE];

    cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);

arm64 requires the stack pointer to be 16 byte aligned otherwise
SPAlignmentFault occurs, this appears as Bus error in the userspace.

The stack provided to the clone system call is not guaranteed to be
aligned properly in this selftest.

The test hangs on the following line:
    while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);

Because the child process is killed due to SPAlignmentFault, the
fentry_cnt remains at 0!

Reading the man page of clone system call, the correct way to allocate
stack for this call is using mmap like this:

stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

This fixes the issue, I will send a patch to use this and once again
remove this test from DENYLIST and I hope this time it fixes it for good.

> It looks like there is still an issue left. A recent CI run on bpf-next is
> still hitting the same on arm64:
>
> Base:
>
>    https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/
>
> CI:
>
>    https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436
>
>    [...]
>    #89/11   fexit_bpf2bpf/func_replace_global_func:OK
>    #89/12   fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
>    #89/13   fexit_bpf2bpf/func_replace_progmap:OK
>    #89      fexit_bpf2bpf:OK
>    Error: The operation was canceled.

Thanks,
Puranjay

Download attachment "signature.asc" of type "application/pgp-signature" (256 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ