[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <YfK18x/XrYL4Vw8o@syu-laptop>
Date: Thu, 27 Jan 2022 23:10:43 +0800
From: Shung-Hsi Yu <shung-hsi.yu@...e.com>
To: bpf@...r.kernel.org, netdev@...r.kernel.org,
Andrii Nakryiko <andrii@...nel.org>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>
Subject: BTF compatibility issue across builds
Hi,
We recently run into module load failure related to split BTF on openSUSE
Tumbleweed[1], which I believe is something that may also happen on other
rolling distros.
The error looks like the follow (though failure is not limited to ipheth)
BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:
failed to validate module [ipheth] BTF: -22
The error comes down to trying to load BTF of *kernel modules from a
different build* than the runtime kernel (but the source is the same), where
the base BTF of the two build is different.
While it may be too far stretched to call this a bug, solving this might
make BTF adoption easier. I'd natively think that we could further split
base BTF into two part to avoid this issue, where .BTF only contain exported
types, and the other (still residing in vmlinux) holds the unexported types.
Does that sound like something reasonable to work on?
## Root case (in case anyone is interested in a verbose version)
On openSUSE Tumbleweed there can be several builds of the same source. Since
the source is the same, the binaries are simply replaced when a package with
a larger build number is installed during upgrade.
In our case, a rebuild is triggered[2], and resulted in changes in base BTF.
More precisely, the BTF_KIND_FUNC{,_PROTO} of i2c_smbus_check_pec(u8 cpec,
struct i2c_msg *msg) and inet_lhash2_bucket_sk(struct inet_hashinfo *h,
struct sock *sk) was added to the base BTF of 5.15.12-1.3. Those functions
are previously missing in base BTF of 5.15.12-1.1.
The addition of entries in BTF type and string table caused extra offset of
type IDs and string position in the base BTF, and as such the same type ID
may refers to a totally different type, and as does name_off of types.
When users on build#1 (ie 5.15.12-1.1) installs build#3 (ie 5.15.12-1.3),
and then tries to load kernel module, they will be loading build#3 module on
build#1 kernel; and with base BTF of the two builds different, name_off of
some types will end up pointing at invalid string, and the kernel bails out.
Best,
Shung-Hsi Yu
1: https://bugzilla.opensuse.org/show_bug.cgi?id=1194501
2: my guess is rebuild is trigger due to compiler toolchain update, but I
wasn't able to pin down exactly what changed
Powered by blists - more mailing lists