lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fUm0-f6CW1DNKWK0Zv_4Hzqe5oV+d4ajhd3+XMdxXvu2Q@mail.gmail.com>
Date: Wed, 3 Sep 2025 21:01:06 -0700
From: Ian Rogers <irogers@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, 
	Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] bpf: Add kernel-doc for struct bpf_prog_info

On Wed, Sep 3, 2025 at 7:17 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Wed, Sep 3, 2025 at 10:25 AM Ian Rogers <irogers@...gle.com> wrote:
> >
> > Recently diagnosing a regression [1] would have been easier if struct
> > bpf_prog_info had some comments explaining its usage. As I found it
> > hard to generate comments for some parts of the struct,q what is here is a
>
> "struct,q" ??

Apologies, typo on pressing the 'q'.

>
>
> > mix of mostly hand written, but some AI written, comments.
> >
> > [1] https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/
>
> The perf bug looks unrelated.
> It's not worth it to put this kind of info in the commit log.

Feedback from people working on perf was that the bpf_prog_info (the
struct used after freed) was insufficient to understand how it worked.
For example, the implied nature of the u64 pointer values. If they
were non-zero (say just uninitialized) then the syscall would more
than likely return EFAULT which would seem to imply the bpf_prog_info
was incorrect and not things it points to. Anyway, I'm easy about
removing context from the patch.

> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> >  include/uapi/linux/bpf.h | 187 ++++++++++++++++++++++++++++++++++++++-
>
> In general, yeah, it could use a doc,
> but tools/...bpf.h must be updated at the same time to keep them in sync.

It must, be we generally update the kernel version first and then sync
to tools. I didn't want to make the series overly spammy. The perf
build warns of things being out of sync here (bpf.h isn't currently
included):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/check-headers.sh?h=perf-tools-next

> >  1 file changed, 186 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 233de8677382..008b559dc5c5 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6607,45 +6607,230 @@ struct sk_reuseport_md {
> >
> >  #define BPF_TAG_SIZE   8
> >
> > +/**
> > + * struct bpf_prog_info - Information about a BPF program.
> > + *
> > + * This structure is used by the bpf(BPF_OBJ_GET_INFO_BY_FD) syscall to retrieve
> > + * metadata about a loaded BPF program. When values like the jited_prog_insns
> > + * are desired typically two syscalls will be made, the first to determine the
> > + * length of the buffers and the second with buffers for the syscall to fill
> > + * in. The variables within the struct are ordered to minimize padding.
> > + */
>
> "to minimize padding" ?! Do you see holes in the struct?

I think you've read this the opposite way it is intended. Sometimes in
the struct there is adjacent:

__u32  nr_<foo>;
__aligned_u64 <foo>;

This is true for say map_ids and line_info. Sometimes the order is:

__aligned_u64 <foo>;
__u32  nr_<foo>;

This is true for jited_line_info. Sometimes things are spread apart
and sometimes there is an additional <foo>_rec_size which again may be
spread apart.

What this is trying to say is when looking for things relating to
<foo> keep searching the rest of the struct as the order has been made
to (I assume) minimize padding. I may be so bold as to say that not
grouping related variables in a struct is not general practice in C.

> >  struct bpf_prog_info {
> > +       /**
> > +        * @type: The type of the BPF program (e.g.,
> > +        * BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_KPROBE). This defines
> > +        * where the program can be attached.
> > +        */
> >         __u32 type;
> > +       /**
> > +        * @id: A unique, kernel-assigned ID for the loaded BPF program.
> > +        */
>
> I wouldn't call it unique. It's 32-bit and can be reused
> if somebody loads/unloads 4B bpf progs.

Perhaps clarifying that the ID can be reused in the case of a program
being unloaded and loaded.

> >         __u32 id;
> > +       /**
> > +        * @tag: A user-defined tag for the program, often a hash of the
> > +        * object file it came from. Size is BPF_TAG_SIZE (8 bytes).
> > +        */
>
> That is just wrong. It's your job to check AI imaginations.

Tbh, I don't know what a "tag" is supposed to be nor did I find the
name intention revealing. This is very much the reason I'm subjecting
myself to this ordeal. I recall hashing of BPF instructions post
verification making them not useful for identification being mentioned
to me at one point, so hey AI your guess is as good as mine. Could you
help clarify what the definition should be?

> >         __u8  tag[BPF_TAG_SIZE];
> > +       /**
> > +        * @jited_prog_len: As an in argument this is the length of the
> > +        * jited_prog_insns buffer. As an out argument, the length of the
> > +        * JIT-compiled (native machine code) program image in bytes.
> > +        */
> >         __u32 jited_prog_len;
> > +       /**
> > +        * @xlated_prog_len: As an in argument this is the length of the
> > +        * xlated_prog_insns buffer. As an out argument, the length of the
> > +        * translated BPF bytecode in bytes, after the verifier has potentially
> > +        * modified it. 'xlated' is short for 'translated'.
> > +        */
> >         __u32 xlated_prog_len;
> > +       /**
> > +        * @jited_prog_insns: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * jited_prog_len(s) worth of JIT-compiled machine code instructions into
> > +        * the buffer.
> > +        */
> >         __aligned_u64 jited_prog_insns;
> > +       /**
> > +        * @xlated_prog_insns: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * xlated_prog_len(s) worth of translated, after BPF verification, BPF
> > +        * bytecode into the buffer.
> > +        */
> >         __aligned_u64 xlated_prog_insns;
> > -       __u64 load_time;        /* ns since boottime */
> > +       /**
> > +        * @load_time: The timestamp (in nanoseconds since boot time) when the
> > +        * program was loaded into the kernel.
> > +        */
> > +       __u64 load_time;
> > +       /**
> > +        * @created_by_uid: The user ID of the process that loaded this program.
> > +        */
> >         __u32 created_by_uid;
> > +       /**
> > +        * @nr_map_ids: As an in argument this is the length of the map_ids
> > +        * buffer in sizes of u32 (4 bytes). As an out argument, the number of
> > +        * BPF maps used by this BPF program.
> > +        */
> >         __u32 nr_map_ids;
> > +       /**
> > +        * @map_ids: When 0 (NULL) this is ignored by the kernel. When non-zero
> > +        * a pointer to a buffer is expected and the kernel will write
> > +        * nr_map_ids(s) worth of u32 kernel allocated BPF map id values into the
> > +        * buffer.
> > +        */
> >         __aligned_u64 map_ids;
> > +       /**
> > +        * @name: The name of the program, as specified in the ELF object file.
> > +        * The max length is BPF_OBJ_NAME_LEN (16 characters).
> > +        */
>
> This is generally not true. bpf prog may not come from ELF.

Thanks, I can clarify this.

> >         char name[BPF_OBJ_NAME_LEN];
> > +       /**
> > +        * @ifindex: If the program is attached to a network device (netdev),
> > +        * this field holds the interface index.
> > +        */
> >         __u32 ifindex;
> > +       /**
> > +        * @gpl_compatible: A flag indicating if the program is compatible with
> > +        * a GPL license. This is important for using certain GPL-only helpers.
> > +        */
> >         __u32 gpl_compatible:1;
> >         __u32 :31; /* alignment pad */
> > +       /**
> > +        * @netns_dev: The device identifier of the network namespace the
> > +        * program is attached to.
> > +        */
> >         __u64 netns_dev;
> > +       /**
> > +        * @netns_ino: The inode number of the network namespace the program is
> > +        * attached to.
> > +        */
> >         __u64 netns_ino;
> > +       /**
> > +        * @nr_jited_ksyms: As an in argument this is the length of the
> > +        * jited_ksyms buffer in sizes of u64 (8 bytes). As an out argument, the
> > +        * number of kernel symbols that the BPF program calls.
> > +        */
> >         __u32 nr_jited_ksyms;
> > +       /**
> > +        * @nr_jited_func_lens: As an in argument this is the length of the
> > +        * jited_func_lens buffer in sizes of u32 (4 bytes). As an out argument,
> > +        * the number of distinct functions within the JIT-ed program.
> > +        */
> >         __u32 nr_jited_func_lens;
> > +       /**
> > +        * @jited_ksyms: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_jited_ksyms(s) worth of addresses of kernel symbols into the u64
> > +        * buffer.
> > +        */
> >         __aligned_u64 jited_ksyms;

Fwiw, I don't know what a kernel symbol is in this context. Perf is
assuming they are only other BPF programs:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n63
Similarly, why does perf assume nr_jited_ksym == nr_prog_tags ==
nr_jited_func_lens :
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n943
It'd be nice to have this invariant documented, and of course fix perf
if it isn't true. Of course the code has no comments and I'm sure I
must be an idiot for failing to understand this.

> > +       /**
> > +        * @jited_func_lens: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_jited_func_lens(s) worth of lengths into the u32 buffer.
> > +        */
> >         __aligned_u64 jited_func_lens;
> > +       /**
> > +        * @btf_id: The ID of the BTF (BPF Type Format) object associated with
> > +        * this program, which contains type information for debugging and
> > +        * introspection.
> > +        */
> >         __u32 btf_id;
> > +       /**
> > +        * @func_info_rec_size: The size in bytes of a single `bpf_func_info`
> > +        * record.
> > +        */
> >         __u32 func_info_rec_size;
> > +       /**
> > +        * @func_info: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_func_info(s) worth of func_info_rec_size values.
> > +        */
> >         __aligned_u64 func_info;
> > +       /**
> > +        * @nr_func_info: As an in argument this is the length of the func_info
> > +        * buffer in sizes of func_info_rec_size. As an out argument, the number
> > +        * of `bpf_func_info` records available.
> > +        */
> >         __u32 nr_func_info;
> > +       /**
> > +        * @nr_line_info: As an in argument this is the length of the line_info
> > +        * buffer in sizes of line_info_rec_size. As an out argument, the number
> > +        * of `bpf_line_info` records, which map BPF instructions to source code
> > +        * lines.
> > +        */
> >         __u32 nr_line_info;
> > +       /**
> > +        * @line_info: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_line_info(s) worth of line_info_rec_size values.
> > +        */
> >         __aligned_u64 line_info;
> > +       /**
> > +        * @jited_line_info: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_jited_line_info(s) worth of jited_line_info_rec_size values.
> > +        */
> >         __aligned_u64 jited_line_info;
> > +       /**
> > +        * @nr_line_info: As an in argument this is the length of the
> > +        * jited_line_info buffer in sizes of jited_line_info_rec_size. As an
> > +        * out argument, the number of `bpf_line_info` records, which map JIT-ed
> > +        * instructions to source code lines.
> > +        */
> >         __u32 nr_jited_line_info;
> > +       /**
> > +        * @line_info_rec_size: The size in bytes of a `bpf_line_info` record.
> > +        */
> >         __u32 line_info_rec_size;
> > +       /**
> > +        * @jited_line_info_rec_size: The size in bytes of a `bpf_line_info`
> > +        * record for JIT-ed code.
> > +        */
> >         __u32 jited_line_info_rec_size;
> > +       /**
> > +        * @nr_prog_tags: As an in argument this is the length of the prog_tags
> > +        * buffer in sizes of BPF_TAG_SIZE (8 bytes). As an out argument, the
> > +        * number of program tags, which are hashes of programs that this
> > +        * program can tail-call.
> > +        */
>
> what? number of progs that prog can tail-call ?!
>
> What AI LLM did you use?
>
> Please share, so we can tell everyone to avoid it at all cost.

Sure, it was llama ;-)

Anyway, progs? Could this be any less intention revealing. I'd love a
proper definition.

> >         __u32 nr_prog_tags;
> > +       /**
> > +        * @prog_tags: When 0 (NULL) this is ignored by the kernel. When
> > +        * non-zero a pointer to a buffer is expected and the kernel will write
> > +        * nr_prog_tags(s) worth of BPF_TAG_SIZE values.
> > +        */
>
> wrong again.

Sure, could you provide a correction? I was trying to base what was
written here off of:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/syscall.c#n5121
a 300 line function successfully avoiding any comments.

> >         __aligned_u64 prog_tags;
> > +       /**
> > +        * @run_time_ns: The total accumulated execution time of the program in
> > +        * nanoseconds.
> > +        */
>
> Missing critical detail that the kernel doesn't keep counting it all the time.

I'm not sure what you mean by this? Do you think the comment is saying
that the run_time is the run_time since loading? But it says
"accumulated execution time" which would imply only time spent
executing. When is it not counting?

> >         __u64 run_time_ns;
> > +       /**
> > +        * @run_cnt: The total number of times the program has been executed.
> > +        */
>
> ditto

Shouldn't the purpose of run_time_ns and run_cnt to be to calculate an
average run_time? If these are arbitrary values, what's the point?
Again, why isn't this explained? Thank you for helping me to try to
fix this. I'm also happy for others to fix this and this patch to be
completely ignored. It can be ignored in all scenarios, I was just
trying to be helpful to others and probably my future self at some
point.

> >         __u64 run_cnt;
> > +       /**
> > +        * @recursion_misses: The number of failed tail calls due to reaching
> > +        * the recursion limit.
> > +        */
> >         __u64 recursion_misses;
> > +       /**
> > +        * @verified_insns: The number of instructions processed by the
> > +        * verifier.
> > +        */
>
> The comment needs to be expanded.

Sure, suggestions?

> >         __u32 verified_insns;
> > +       /**
> > +        * @attach_btf_obj_id: If attached via BTF (e.g., fentry/fexit), this is
> > +        * the BTF object ID of the target object (e.g., kernel vmlinux).
> > +        */
>
> "e.g. kernel vmlinux"...
> sigh.
> Don't use this LLM.

Use the well documented code instead! :-)

Thanks!
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ