linux-kernel - Re: [PATCH] selftests: Do not skip BPF selftests by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd9fb256-11ac-45f6-b584-c2e9c1b04c58@linuxfoundation.org>
Date: Fri, 4 Oct 2024 12:52:37 -0600
From: Shuah Khan <skhan@...uxfoundation.org>
To: Mark Brown <broonie@...nel.org>, Björn Töpel
 <bjorn@...nel.org>
Cc: Shuah Khan <shuah@...nel.org>, linux-kselftest@...r.kernel.org,
 bpf@...r.kernel.org, Andrii Nakryiko <andrii@...nel.org>,
 Eduard Zingerman <eddyz87@...il.com>, Mykola Lysenko <mykolal@...com>,
 Björn Töpel <bjorn@...osinc.com>,
 linux-kernel@...r.kernel.org, Catalin Marinas <catalin.marinas@....com>,
 Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Nick Desaulniers <ndesaulniers@...gle.com>,
 Nathan Chancellor <nathan@...nel.org>,
 Anders Roxell <anders.roxell@...aro.org>,
 Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH] selftests: Do not skip BPF selftests by default

On 10/4/24 08:46, Mark Brown wrote:
> On Fri, Oct 04, 2024 at 03:34:49PM +0200, Björn Töpel wrote:
>> Mark Brown <broonie@...nel.org> writes:
>>> On Fri, Oct 04, 2024 at 11:53:47AM +0200, Björn Töpel wrote:
> 
>>>> This effectively is a revert of commit 7a6eb7c34a78 ("selftests: Skip
>>>> BPF seftests by default"). At the time when this was added, BPF had
>>>> "build time dependencies on cutting edge versions". Since then a
>>>> number of BPF capable tests has been included in net, hid, sched_ext.
> 
>>> The issue was always requiring a bleeding edge version of clang, not
>>> sure if that's been relaxed yet, IIRC sometimes it required git
>>> versions.  I have clang 20 installed here so that's not an issue for me
>>> but given that that's not released yet it wouldn't be reasonable to
>>> expect CI systems to install it.
> 
>> Yeah, but I'd say that is not the case anymore. LLVM 18 and 19 works.
> 
> Hrm, that's definitely a lot better then though still a little cutting
> edge - the 24.10 Ubuntu release has clang 17, never mind any of the
> stables or LTSs (Debian is very popular for build containers).  Not
> quite at the "you can just install your distro package" level yet though
> it's definitely substantial progress.  Is this requirement documented
> somewhere someone could reasonably be expected to discover it?
> 
> It's a bit unfortunate having to pull clang into GCC build containers,
> and needing a newer version than the minimum clang for the kernel itself
> too :/
> 
>>> We also get a bunch of:
> 
>>> die__process_unit: DW_TAG_label (0xa) @ <0x58eb7> not handled!
>>> die__process_unit: tag not supported 0xa (label)!
> 
>>> if we do turn enable CONFIG_DEBUG_INFO_BTF for arm64.
> 
>> This is pahole version related.
> 
> Which version is needed?  I've got 1.24 (from Debian) here...
> 
>>> The whole thing is also broken for cross compilation with clang since
>>> everything is assuming that CROSS_COMPILE will be set for cross builds
>>> but that's not the case for LLVM=1 builds - net gives:
> 
>> A lot can be said about kselftest, and cross-building. It's a bit of a
>> mess. Maybe we should move to meson or something for kselftest (that
>> requires less work for lazy developers like me). ;-)
> 
> AFAICT it pretty much works fine?  It's certainly widely used.
> 
>> I'm simply arguing that the *default* should be: BPF (and
>> hid/net/sched_ext) turned on. Default on would surface these kind of
>> problems, rather than hiding them. (And let the CI exclude tests it
>> cannot handle).
> 
> The original motivation behind that patch was that there were a bunch of
> CI systems all trying to run as many of the selftests as they can,
> running into BPF and getting frustrated at the amount of time it was
> consuming (or not managing to get it working at all).  Everyone was
> assuming they were missing something or somehow doing the wrong thing to
> satisfy the dependencies and it was burning a bunch of time and
> discouraging people from using the selftests at all since it doesn't
> create a good impression if stuff just doesn't build.  People did often
> end up skipping BPF, but only after banging their heads against it for a
> while, and then went and compared notes with other CI systems and found
> everyone else had the same problem.
> 
> I think we before defaulting BPF stuff on we should at the very least
> fix the builds for commonly covered architectures, it looks like as well
> as arm64 we're also seeing BTF not generated on 32 bit arm:
> 
>     https://storage.kernelci.org/next/master/next-20241004/arm/multi_v7_defconfig%2Bkselftest/gcc-12/config/kernel.config
> 
> but everything else I spot checked looks fine.  It'd be much better to
> skip gracefully if the kernel doesn't have BPF too.

+1 on this. I would rather have tests check for their dependencies
and skip them. Let's fix the tests to fail gracefully before adding
bpf and bpf dependent tests to default run.

I saw your patch to add sched_ext which shouldn't be added until
we get these resolved. Maybe you could include the checks to the
following patch to fail gracefully if kernel doesn't have BPF
and the environment doesn't have the dependencies installed.

Makefile could handle some of these and skip build instead of
failing the build. Individual test build failures shouldn't stop
kselftest-all build.

https://patchwork.kernel.org/project/linux-kselftest/patch/20241004094247.795385-1-bjorn@kernel.org/

> 
> We should probably also have explicit clang presence and feature/version
> checks in the builds since clang is now fairly widely available in
> distros but many of them have older versions than are needed so I
> imagine a common failure pattern might be that people see clang is
> needed, install their distro clang package and then run into errors from
> clang.  That'd mean people would get a graceful skip with a clear
> description of what's needed rather than build errors.
> 
> This is all a particular issue for the net tests where the addition of BPF
> is a regression, not only can't you run the BPF based tests without
> getting BPF working we've now lost the entire net testsuite if BPF isn't
> working since it breaks the build.  TBH I didn't notice this getting
> broken because I forgot that I was expecting to see net tests on
> kernelci.org and the build break means they just disappear entirely from
> the runtime results.  That really does need a graceful skip.

This is unfortunate. We have to fix the bpf dependent tests to do build
and run-time checks.

We can add the tests to default once they can work properly without their
dependencies.

thanks,
-- Shuah