lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251031032627.1414462-6-jianyungao89@gmail.com>
Date: Fri, 31 Oct 2025 11:26:26 +0800
From: Jianyun Gao <jianyungao89@...il.com>
To: bpf@...r.kernel.org
Cc: Jianyun Gao <jianyungao89@...il.com>,
	Andrii Nakryiko <andrii@...nel.org>,
	Eduard Zingerman <eddyz87@...il.com>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Martin KaFai Lau <martin.lau@...ux.dev>,
	Song Liu <song@...nel.org>,
	Yonghong Song <yonghong.song@...ux.dev>,
	John Fastabend <john.fastabend@...il.com>,
	KP Singh <kpsingh@...nel.org>,
	Stanislav Fomichev <sdf@...ichev.me>,
	Hao Luo <haoluo@...gle.com>,
	Jiri Olsa <jolsa@...nel.org>,
	linux-kernel@...r.kernel.org (open list)
Subject: [PATCH 5/5] libbpf: Add doxygen documentation for btf/iter etc. in bpf.h

Add doxygen comment blocks for emaining helpers (btf/iter etc.) in
tools/lib/bpf/bpf.h. These doc comments are for:

-libbpf_set_memlock_rlim()
-bpf_btf_load()
-bpf_iter_create()
-bpf_btf_get_next_id()
-bpf_btf_get_fd_by_id()
-bpf_btf_get_fd_by_id_opts()
-bpf_raw_tracepoint_open_opts()
-bpf_raw_tracepoint_open()
-bpf_task_fd_query()

Signed-off-by: Jianyun Gao <jianyungao89@...il.com>
---
 tools/lib/bpf/bpf.h | 745 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 740 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 28bde19a45c1..0eed179f4a6c 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -34,7 +34,61 @@
 #ifdef __cplusplus
 extern "C" {
 #endif
-
+/**
+ * @brief Adjust process RLIMIT_MEMLOCK to facilitate loading BPF objects.
+ *
+ * libbpf_set_memlock_rlim() raises (or lowers) the calling process's
+ * RLIMIT_MEMLOCK soft and hard limits to at least the number of bytes
+ * specified by memlock_bytes. BPF map and program creation can require
+ * locking kernel/user pages; if RLIMIT_MEMLOCK is too low the kernel
+ * will fail operations with EPERM/ENOMEM. This helper provides a
+ * convenient way to pre‑allocate sufficient memlock quota.
+ *
+ * Semantics:
+ *   - If current (soft or hard) RLIMIT_MEMLOCK is already >= memlock_bytes,
+ *     the limit is left unchanged and the function succeeds.
+ *   - Otherwise, the function attempts to set both soft and hard limits
+ *     to memlock_bytes using setrlimit(RLIMIT_MEMLOCK, ...).
+ *   - On systems enforcing privilege constraints, increasing the hard
+ *     limit may require CAP_SYS_RESOURCE; lack of privilege yields failure.
+ *
+ * Typical usage (before loading large maps/programs):
+ *   size_t needed = 128ul * 1024 * 1024; // 128 MB
+ *   if (libbpf_set_memlock_rlim(needed) < 0) {
+ *       // handle error (e.g., fall back to smaller maps or abort)
+ *   }
+ *
+ * Choosing a value:
+ *   - Sum anticipated sizes of maps (key_size + value_size) * max_entries
+ *     plus overhead. Add headroom for verifier, BTF, and future growth.
+ *   - Large per-CPU maps multiply value storage by number of CPUs.
+ *   - Overestimating is usually harmless (within administrative policy).
+ *
+ * Concurrency & scope:
+ *   - Affects only the calling process's RLIMIT_MEMLOCK.
+ *   - Child processes inherit the adjusted limits after fork/exec.
+ *
+ * Security / privileges:
+ *   - Increasing the hard limit above the current maximum may require
+ *     CAP_SYS_RESOURCE or appropriate PAM/ulimit configuration.
+ *   - Without sufficient privilege, the call fails with -errno (often -EPERM).
+ *
+ * @param memlock_bytes Desired minimum RLIMIT_MEMLOCK (in bytes). If zero,
+ *                      the function is a no-op (always succeeds).
+ *
+ * @return 0 on success;
+ *         < 0 negative error code (libbpf style == -errno) on failure:
+ *           - -EINVAL: Invalid argument (e.g., internal conversion issues).
+ *           - -EPERM / -EACCES: Insufficient privilege to raise hard limit.
+ *           - -ENOMEM: Rare failure allocating internal structures.
+ *           - Other -errno codes propagated from setrlimit().
+ *
+ * Failure handling:
+ *   - A failure means RLIMIT_MEMLOCK is unchanged; subsequent BPF map/program
+ *     loads may still succeed if existing limit is adequate.
+ *   - Check current limits manually (getrlimit) if precise sizing is critical.
+ *
+ */
 LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes);
 
 struct bpf_map_create_opts {
@@ -295,7 +349,104 @@ struct bpf_btf_load_opts {
 	size_t :0;
 };
 #define bpf_btf_load_opts__last_field token_fd
-
+/**
+ * @brief Load a BTF (BPF Type Format) blob into the kernel and obtain a BTF object FD.
+ *
+ * bpf_btf_load() wraps the BPF_BTF_LOAD command of the bpf(2) syscall. It validates
+ * and registers the BTF metadata described by @p btf_data so that subsequently loaded
+ * BPF programs and maps can reference rich type information (for CO-RE relocations,
+ * pretty printing, introspection, etc.).
+ *
+ * Typical usage:
+ *   // Prepare optional verifier/logging buffer (only if you want kernel diagnostics)
+ *   char log_buf[1 << 20] = {};
+ *   struct bpf_btf_load_opts opts = {
+ *       .sz        = sizeof(opts),
+ *       .log_buf   = log_buf,
+ *       .log_size  = sizeof(log_buf),
+ *       .log_level = 1,              // >0 to request kernel parsing/validation log
+ *   };
+ *   int btf_fd = bpf_btf_load(btf_blob_ptr, btf_blob_size, &opts);
+ *   if (btf_fd < 0) {
+ *       // Inspect errno; if opts.log_buf was provided, it may contain details.
+ *   } else {
+ *       // Use btf_fd (e.g. pass to bpf_prog_load() via prog_btf_fd, or query info).
+ *   }
+ *
+ * Input expectations:
+ *   - @p btf_data must point to a complete, well-formed BTF buffer starting with
+ *     struct btf_header followed by the type section and string section.
+ *   - @p btf_size is the total size in bytes of that buffer.
+ *   - Endianness must match the running kernel; cross-endian BTF is rejected.
+ *   - Types must obey kernel constraints (e.g., no unsupported kinds, valid string
+ *     offsets, canonical integer encodings, no dangling references).
+ *
+ * Logging (opts->log_*):
+ *   - If @p opts is non-NULL and opts->log_level > 0, the kernel may emit a textual
+ *     parse/validation log into opts->log_buf (up to opts->log_size - 1 bytes, with
+ *     trailing '\0').
+ *   - On supported kernels, opts->log_true_size is updated to reflect the full (untruncated)
+ *     length of the internal log; if larger than log_size, the log was truncated.
+ *   - If the kernel does not support returning true size, log_true_size remains equal
+ *     to the original log_size value or zero.
+ *
+ * Privileges & security:
+ *   - CAP_BPF and/or CAP_SYS_ADMIN may be required depending on kernel configuration,
+ *     LSM policy, and lockdown mode. Lack of privilege yields -EPERM / -EACCES.
+ *   - In delegated environments, opts->token_fd (if available and supported) can grant
+ *     scoped permission to load BTF without full global capabilities.
+ *
+ * Memory and lifetime:
+ *   - On success a file descriptor (>= 0) referencing the in-kernel BTF object is returned.
+ *     Close it with close() when no longer needed.
+ *   - The kernel makes its own copy of the supplied BTF blob; the caller can free or reuse
+ *     @p btf_data immediately after the call returns.
+ *   - BTF objects can be queried via bpf_btf_get_info_by_fd() and referenced by programs
+ *     (prog_btf_fd) or maps for type information.
+ *
+ * Concurrency & races:
+ *   - Loading is independent; multiple BTF objects may coexist.
+ *   - There is no automatic deduplication across separate loads (except any internal
+ *     kernel optimizations); user space manages uniqueness/pinning if desired.
+ *
+ * Validation tips:
+ *   - Use bpftool btf dump to sanity‑check a blob before loading.
+ *   - Keep string table minimal; excessive strings inflate memory and may hit limits.
+ *   - Ensure all referenced type IDs exist and form a closed, acyclic graph (except
+ *     for permitted self-references in struct/union definitions).
+ *
+ * After loading:
+ *   - Pass the returned FD as prog_btf_fd when loading programs that rely on CO-RE
+ *     relocations or need BTF type validation.
+ *   - Optionally pin the BTF object with bpf_obj_pin() for persistence across process
+ *     lifetimes.
+ *   - Query metadata (e.g., number of types, string section size) with bpf_btf_get_info_by_fd().
+ *
+ * @param btf_data Pointer to the raw in-memory BTF blob.
+ * @param btf_size Size (in bytes) of the BTF blob pointed to by @p btf_data.
+ * @param opts     Optional pointer to a bpf_btf_load_opts struct. May be NULL.
+ *                 Must set opts->sz = sizeof(*opts) when non-NULL. Fields:
+ *                   - log_buf / log_size / log_level: Request and store kernel
+ *                     validation log (see Logging).
+ *                   - log_true_size: Updated by kernel on success (if supported).
+ *                   - btf_flags: Reserved for future extensions (must be 0 unless documented).
+ *                   - token_fd: Delegated permission token (0 or -1 if unused).
+ *
+ * @return
+ *   >= 0 : File descriptor referencing the loaded BTF object.
+ *   < 0  : Negative error code (see Error handling).
+ *
+ * Error handling (negative return codes == -errno style):
+ *   - -EINVAL: Malformed BTF (bad header, section sizes, invalid type graph, bad string
+ *              offsets, unsupported features), opts->sz mismatch, bad flags.
+ *   - -EFAULT: @p btf_data or opts->log_buf points to unreadable/writable memory.
+ *   - -ENOMEM: Kernel failed to allocate memory for internal BTF representation.
+ *   - -EPERM / -EACCES: Insufficient privileges or blocked by security policy.
+ *   - -E2BIG: Exceeds kernel size/complexity limits (e.g., too many types or strings).
+ *   - -ENOTSUP / -EOPNOTSUPP: Kernel lacks support for a feature used in the blob (rare).
+ *   - Other negative codes may be propagated from the underlying syscall.
+ *
+ */
 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
 			    struct bpf_btf_load_opts *opts);
 
@@ -1835,7 +1986,84 @@ struct bpf_link_update_opts {
  */
 LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
 			       const struct bpf_link_update_opts *opts);
-
+/**
+ * @brief Create a user space iterator stream FD from an existing BPF iterator link.
+ *
+ * bpf_iter_create() wraps the kernel's BPF_ITER_CREATE command. Given a BPF
+ * link FD (@p link_fd) that represents an attached BPF iterator program
+ * (i.e., a program of type BPF_PROG_TYPE_TRACING with an iterator attach
+ * type such as BPF_TRACE_ITER), this function returns a new file descriptor
+ * from which user space can sequentially read the iterator's textual or
+ * binary output.
+ *
+ * Reading the returned FD:
+ *   - Use read(), pread(), or a buffered I/O layer to consume iterator data.
+ *   - Each read() returns zero (EOF) when the iterator has completed producing
+ *     all elements; close the FD afterward.
+ *   - Short reads are normal; loop until EOF or error.
+ *
+ * Lifetime & ownership:
+ *   - Success returns a new FD; caller owns it and must close() when finished.
+ *   - Closing the iterator FD does NOT destroy the underlying link or program.
+ *   - You can create multiple iterator FDs from the same link concurrently;
+ *     each is an independent traversal.
+ *
+ * Typical usage:
+ *   int link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_ITER, &opts);
+ *   if (link_fd < 0) { // handle error  }
+ *   int iter_fd = bpf_iter_create(link_fd);
+ *   if (iter_fd < 0) { // handle error  }
+ *   char buf[4096];
+ *   for (;;) {
+ *       ssize_t n = read(iter_fd, buf, sizeof(buf));
+ *       if (n < 0) {
+ *           if (errno == EINTR) continue;
+ *           perror("read iter");
+ *           break;
+ *       }
+ *       if (n == 0) // end of iteration
+ *           break;
+ *       fwrite(buf, 1, n, stdout);
+ *   }
+ *   close(iter_fd);
+ *
+ * Concurrency & races:
+ *   - Safe to call concurrently from multiple threads; each iterator FD
+ *     represents its own walk.
+ *   - Underlying kernel objects (maps, tasks, etc.) may change while iterating;
+ *     output is a best-effort snapshot, not a stable, atomic view.
+ *
+ * Performance considerations:
+ *   - Large buffers (e.g., 16–64 KiB) reduce syscall overhead for high‑volume
+ *     iterators.
+ *   - For blocking behavior, select()/poll()/epoll() can be used; EOF is
+ *     indicated by read() returning 0.
+ *
+ * Security & privileges:
+ *   - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration,
+ *     lockdown mode, and LSM policy governing the iterator target.
+ *
+ * @param link_fd File descriptor of a BPF link representing an attached iterator program.
+ *
+ * @return >= 0: Iterator stream file descriptor to read from.
+ *         < 0 : Negative error code (libbpf style, == -errno) on failure.
+ *
+ *
+ * Error handling (negative libbpf-style return value == -errno):
+ *   - -EBADF: @p link_fd is not a valid open FD.
+ *   - -EINVAL: @p link_fd does not refer to an iterator-capable BPF link, or
+ *              unsupported combination for the running kernel.
+ *   - -EPERM / -EACCES: Insufficient privileges / blocked by security policy.
+ *   - -EOPNOTSUPP / -ENOTSUP: Kernel lacks iterator creation support for this link.
+ *   - -ENOMEM: Kernel could not allocate internal data structures.
+ *   - Other -errno codes may be propagated from the underlying bpf() syscall.
+ *
+ * Robustness tips:
+ *   - Verify the program was attached with the correct iterator attach type.
+ *   - Treat a 0-length read as normal completion, not an error.
+ *   - Always handle transient read() failures (EINTR, EAGAIN if non-blocking).
+ *
+ */
 LIBBPF_API int bpf_iter_create(int link_fd);
 
 struct bpf_prog_test_run_attr {
@@ -1948,6 +2176,68 @@ LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
  */
 LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 
+/**
+ * @brief Retrieve the next existing BTF object ID after a given starting ID.
+ *
+ * This helper wraps the kernel's BPF_BTF_GET_NEXT_ID command and enumerates
+ * in‑kernel BTF (BPF Type Format) objects in strictly ascending order of
+ * their kernel‑assigned IDs. It is typically used to iterate all currently
+ * loaded BTF objects (e.g., vmlinux BTF, module BTFs, user‑loaded BTF blobs).
+ *
+ * Enumeration pattern:
+ *   1. Initialize start_id to 0 to obtain the first (lowest) existing BTF ID.
+ *   2. On success, *next_id is set to the first BTF ID strictly greater than start_id.
+ *   3. Use the returned *next_id as the new start_id in a subsequent call.
+ *   4. Repeat until the function returns -ENOENT, which signals there is no
+ *      BTF object with ID greater than start_id (end of iteration).
+ *
+ * Concurrency & races:
+ *   - BTF objects can be loaded or unloaded concurrently with enumeration.
+ *     An ID retrieved in one call may become invalid (object unloaded) before
+ *     you convert it to a file descriptor with bpf_btf_get_fd_by_id().
+ *   - Enumeration does not provide a stable snapshot. Newly loaded BTFs may
+ *     appear after you've passed their predecessor ID.
+ *
+ * Lifetime & validity:
+ *   - IDs are monotonically increasing and effectively never wrap in normal
+ *     operation.
+ *   - Successfully retrieving an ID does NOT pin the corresponding BTF object.
+ *     Obtain a file descriptor immediately if you need to interact with it.
+ *
+ * Typical usage:
+ *   __u32 id = 0, next;
+ *   while (bpf_btf_get_next_id(id, &next) == 0) {
+ *       int btf_fd = bpf_btf_get_fd_by_id(next);
+ *       if (btf_fd >= 0) {
+ *           // Inspect/query BTF (e.g. bpf_btf_get_info_by_fd()).
+ *           close(btf_fd);
+ *       }
+ *       id = next;
+ *   }
+ *   // Loop ends when bpf_btf_get_next_id() returns -ENOENT.
+ *
+ * @param start_id
+ *        Starting point for the search. The helper finds the first BTF ID
+ *        strictly greater than start_id. Use 0 to begin enumeration.
+ * @param next_id
+ *        Pointer to a __u32 that receives the next BTF ID on success.
+ *        Must not be NULL.
+ *
+ * @return
+ *   0        on success (next_id populated);
+ *   -ENOENT  if there is no BTF ID greater than start_id (normal end of iteration);
+ *   -EINVAL  if next_id is NULL or arguments are otherwise invalid;
+ *   -EPERM / -EACCES if denied by security policy or lacking required privileges;
+ *   Other negative libbpf-style codes (-errno) on transient or system failures.
+ *
+ * Error handling notes:
+ *   - Treat -ENOENT as normal termination, not an exceptional error.
+ *   - For other failures, errno is set to the underlying cause.
+ *
+ * Follow-up:
+ *   - Convert retrieved IDs to FDs with bpf_btf_get_fd_by_id() to inspect
+ *     metadata or pin the BTF object.
+ */
 LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
 /**
  * @brief Retrieve the next existing BPF link ID after a given starting ID.
@@ -2222,9 +2512,171 @@ LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
  */
 LIBBPF_API int bpf_map_get_fd_by_id_opts(__u32 id,
 				const struct bpf_get_fd_by_id_opts *opts);
-
+/**
+ * @brief Obtain a file descriptor for an existing in-kernel BTF (BPF Type Format)
+ *        object given its kernel-assigned ID.
+ *
+ * bpf_btf_get_fd_by_id() wraps the BPF_BTF_GET_FD_BY_ID command of the bpf(2)
+ * syscall. Each loaded BTF object (vmlinux BTF, kernel module BTF, or user‑supplied
+ * BTF blob loaded via BPF_BTF_LOAD) has a monotonically increasing, unique ID.
+ * This helper converts that stable ID into a process-local file descriptor
+ * suitable for introspection (e.g., via bpf_btf_get_info_by_fd()), pinning
+ * (bpf_obj_pin()), or reuse when loading BPF programs/maps that reference types
+ * from this BTF.
+ *
+ * Typical enumeration + open pattern:
+ *   __u32 id = 0, next;
+ *   while (bpf_btf_get_next_id(id, &next) == 0) {
+ *       int btf_fd = bpf_btf_get_fd_by_id(next);
+ *       if (btf_fd >= 0) {
+ *           // inspect with bpf_btf_get_info_by_fd(btf_fd, ...)
+ *           close(btf_fd);
+ *       }
+ *       id = next;
+ *   }
+ *   // Loop ends when bpf_btf_get_next_id() returns -ENOENT.
+ *
+ * Concurrency & races:
+ *   - A BTF object may be unloaded (e.g., module removal) between discovering
+ *     its ID and calling this function; in that case the call fails with -ENOENT.
+ *   - Successfully obtaining a file descriptor does not prevent later unloading
+ *     by other processes; subsequent operations on the FD can still fail.
+ *
+ * Lifetime & ownership:
+ *   - On success the caller owns the returned descriptor and must close() it
+ *     when no longer needed.
+ *   - Closing the FD does not destroy the underlying BTF object if other
+ *     references (FDs or pinned bpffs paths) remain.
+ *
+ * Privileges / security:
+ *   - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration,
+ *     LSM policies, or lockdown mode. Lack of privilege yields -EPERM / -EACCES.
+ *   - Access can also be restricted by namespace or cgroup-based security policies.
+ *
+ * Use cases:
+ *   - Retrieve BTF metadata (type counts, string section size, specific type
+ *     definitions) via bpf_btf_get_info_by_fd().
+ *   - Pass the FD as prog_btf_fd when loading eBPF programs needing CO-RE or
+ *     type validation.
+ *   - Pin the BTF object for persistence across process lifetimes.
+ *
+ * @param id
+ *        Kernel-assigned unique (non-zero) BTF object ID. Typically obtained via
+ *        bpf_btf_get_next_id() or from a prior info query. Must be > 0.
+ *
+ * @return
+ *   >= 0 : File descriptor referencing the BTF object (caller must close()).
+ *   < 0  : Negative libbpf-style error code (== -errno):
+ *            - -ENOENT : No BTF object with this ID (unloaded or never existed).
+ *            - -EPERM / -EACCES : Insufficient privileges / blocked by policy.
+ *            - -EINVAL : Invalid ID (e.g., 0) or kernel rejected the request.
+ *            - -ENOMEM : Kernel memory/resource exhaustion.
+ *            - Other negative values: Propagated syscall failures.
+ *
+ * Error handling notes:
+ *   - Treat -ENOENT as a normal race outcome if objects can disappear.
+ *   - Always close the returned FD to avoid resource leaks.
+ *
+ * Thread safety:
+ *   - Safe to call concurrently; each successful invocation yields an independent FD.
+ *
+ * Forward compatibility:
+ *   - ID space is monotonic; practical wraparound is not expected.
+ *   - Future kernels may add additional validation or permission gating; handle
+ *     new -errno codes conservatively.
+ */
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
 
+/**
+ * @brief Obtain a file descriptor for an existing in‑kernel BTF (BPF Type Format)
+ *        object by its kernel-assigned ID, with extended open options.
+ *
+ * bpf_btf_get_fd_by_id_opts() is an extended variant of bpf_btf_get_fd_by_id().
+ * It wraps the BPF_BTF_GET_FD_BY_ID command of the bpf(2) syscall and converts
+ * a stable, monotonically increasing BTF object ID (@p id) into a process‑local
+ * file descriptor, honoring optional attributes supplied via @p opts.
+ *
+ * A BTF object represents a loaded collection of type metadata (vmlinux BTF,
+ * kernel module BTF, or user-supplied BTF blob). Programs and maps can refer
+ * to these types for CO-RE relocations, verification, and introspection.
+ *
+ * Typical enumeration + open pattern:
+ *   __u32 cur = 0, next;
+ *   while (bpf_btf_get_next_id(cur, &next) == 0) {
+ *       struct bpf_get_fd_by_id_opts o = {
+ *           .sz = sizeof(o),
+ *           .open_flags = 0,
+ *           .token_fd = -1,
+ *       };
+ *       int btf_fd = bpf_btf_get_fd_by_id_opts(next, &o);
+ *       if (btf_fd >= 0) {
+ *           // use btf_fd (e.g. bpf_btf_get_info_by_fd())
+ *           close(btf_fd);
+ *       }
+ *       cur = next;
+ *   }
+ *   // Loop ends when bpf_btf_get_next_id() returns -ENOENT.
+ *
+ * Initialization & @p opts usage:
+ *   - @p opts may be NULL for default behavior (equivalent to zeroed fields).
+ *   - If @p opts is non-NULL, opts->sz MUST be set to sizeof(*opts); mismatch
+ *     yields -EINVAL.
+ *   - opts->open_flags:
+ *       Reserved for future kernel extensions; pass 0 unless a documented flag
+ *       is supported. Unsupported bits => -EINVAL.
+ *   - opts->token_fd:
+ *       Optional BPF token FD enabling delegated (restricted) permissions. Set
+ *       to -1 or 0 if unused. Provides a way to open BTF objects without full
+ *       CAP_BPF/CAP_SYS_ADMIN in controlled environments.
+ *
+ * Concurrency & races:
+ *   - A BTF object can be unloaded (e.g., module removal) after ID discovery
+ *     but before this call; expect -ENOENT in such races.
+ *   - Successfully obtaining a file descriptor does not guarantee the object
+ *     will remain available for its entire lifetime (it could still be removed
+ *     depending on kernel policies), so subsequent operations may fail.
+ *
+ * Lifetime & ownership:
+ *   - On success you own the returned FD and must close() it when done.
+ *   - Closing the FD does not destroy the BTF object if other references (FDs,
+ *     pinned bpffs entries) remain.
+ *   - You may pin the BTF object via bpf_obj_pin() for persistence.
+ *
+ * Security / privileges:
+ *   - May require CAP_BPF and/or CAP_SYS_ADMIN depending on kernel configuration,
+ *     LSM policy, and lockdown mode.
+ *   - Access via a token_fd is subject to token scope; insufficient rights yield
+ *     -EPERM / -EACCES.
+ *
+ * Use cases:
+ *   - Retrieve type information with bpf_btf_get_info_by_fd().
+ *   - Supply prog_btf_fd when loading eBPF programs needing CO-RE relocations.
+ *   - Enumerate and manage user-loaded or kernel-provided BTF datasets.
+ *
+ * Robustness tips:
+ *   - Treat -ENOENT as a normal race when enumerating dynamic BTF objects.
+ *   - Always zero-initialize opts before setting recognized fields:
+ *       struct bpf_get_fd_by_id_opts o = {};
+ *       o.sz = sizeof(o);
+ *   - Avoid non-zero open_flags until documented; future kernels may add semantic
+ *     modifiers (e.g., restricted viewing modes).
+ *
+ * @param id   Kernel-assigned unique BTF object ID (> 0).
+ * @param opts Optional pointer to struct bpf_get_fd_by_id_opts controlling open
+ *             behavior; may be NULL for defaults.
+ *
+ * @return >= 0: File descriptor referencing the BTF object (caller must close()).
+ *         < 0 : Negative error code (libbpf style == -errno) on failure.
+ *
+ * Error handling (negative return values are libbpf-style == -errno):
+ *   - -ENOENT: No BTF object with @p id (unloaded or never existed).
+ *   - -EINVAL: Invalid @p id (e.g., 0), malformed @p opts (bad sz), or unsupported
+ *              open_flags bits.
+ *   - -EPERM / -EACCES: Insufficient privileges or blocked by security policy.
+ *   - -ENOMEM: Kernel resource allocation failure.
+ *   - Other -errno codes may be propagated from underlying syscall failures.
+ *
+ */
 LIBBPF_API int bpf_btf_get_fd_by_id_opts(__u32 id,
 				const struct bpf_get_fd_by_id_opts *opts);
 /**
@@ -2645,11 +3097,294 @@ struct bpf_raw_tp_opts {
 	size_t :0;
 };
 #define bpf_raw_tp_opts__last_field cookie
-
+/**
+ * @brief Attach a loaded BPF program to a raw tracepoint using extended options.
+ *
+ * bpf_raw_tracepoint_open_opts() wraps the BPF_RAW_TRACEPOINT_OPEN command and
+ * creates a persistent attachment of @p prog_fd to the raw tracepoint named in
+ * @p opts->tp_name. On success it returns a file descriptor representing the
+ * attachment. Closing that FD detaches the program from the tracepoint.
+ *
+ * Compared to bpf_raw_tracepoint_open(), this variant allows passing a user
+ * cookie (opts->cookie) and provides forward/backward compatibility via the
+ * @p opts->sz field.
+ *
+ * Typical usage:
+ *   struct bpf_raw_tp_opts ropts = {
+ *       .sz      = sizeof(ropts),
+ *       .tp_name = "sched_switch",   // raw tracepoint name (no "tracepoint/" prefix)
+ *       .cookie  = 0xdeadbeef,       // optional user cookie (visible to program)
+ *   };
+ *   int tp_fd = bpf_raw_tracepoint_open_opts(prog_fd, &ropts);
+ *   if (tp_fd < 0) {
+ *       // handle error (inspect errno or negative return value)
+ *   }
+ *   // ... use attachment; close(tp_fd) to detach when done.
+ *
+ * Tracepoint name:
+ *   - Use the raw tracepoint identifier as exposed under
+ *     /sys/kernel/debug/tracing/events/* without category prefixes. For raw
+ *     tracepoints this is typically the internal kernel name (e.g., "sched_switch").
+ *   - Passing NULL or an empty string fails with -EINVAL.
+ *
+ * Cookie:
+ *   - opts->cookie (if non-zero) becomes available to the attached program via
+ *     bpf_get_attach_cookie() helper (where supported).
+ *   - Set to 0 if you don't need a cookie; kernel treats it as absent.
+ *
+ * Structure initialization:
+ *   - opts MUST NOT be NULL.
+ *   - Zero-initialize the struct, then set:
+ *       opts->sz      = sizeof(struct bpf_raw_tp_opts);
+ *       opts->tp_name = "<tracepoint_name>";
+ *       opts->cookie  = <optional_cookie>;
+ *   - Unrecognized future fields must remain zero for compatibility.
+ *
+ * Lifetime & detachment:
+ *   - The returned FD solely controls the attachment lifetime. Closing it
+ *     detaches the program.
+ *   - The program FD @p prog_fd may be closed independently after successful
+ *     attachment; the link remains active until the tracepoint FD is closed.
+ *
+ * Concurrency:
+ *   - Multiple programs can attach to the same raw tracepoint (each gets its
+ *     own FD).
+ *   - Attaching/detaching is atomic from the program's perspective; events
+ *     arriving after success will invoke the program.
+ *
+ * Privileges:
+ *   - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on kernel
+ *     configuration, LSM policy, and lockdown mode.
+ *
+ * Performance considerations:
+ *   - Raw tracepoints invoke programs on every event occurrence; ensure program
+ *     logic is efficient to avoid noticeable system overhead.
+ *
+ * @param prog_fd
+ *   File descriptor of a previously loaded BPF program (bpf_prog_load()) that
+ *   is compatible with raw tracepoint attachment (e.g., program type
+ *   BPF_PROG_TYPE_RAW_TRACEPOINT or suitable tracing type).
+ *
+ * @param opts
+ *   Pointer to an initialized bpf_raw_tp_opts structure describing the target
+ *   tracepoint and optional cookie. Must not be NULL. opts->sz must equal
+ *   sizeof(struct bpf_raw_tp_opts).
+ *
+ * @return
+ *   >= 0 : File descriptor representing the attachment (close to detach).
+ *   < 0  : Negative libbpf-style error code (== -errno) on failure:
+ *            - -EINVAL   : Bad prog_fd, malformed opts (sz mismatch, NULL tp_name),
+ *                         unsupported program type, or kernel lacks raw TP support.
+ *            - -EPERM/-EACCES : Insufficient privileges or blocked by security policy.
+ *            - -ENOENT   : Tracepoint name not found / not supported by current kernel.
+ *            - -EBADF    : Invalid prog_fd.
+ *            - -ENOMEM   : Kernel memory/resource exhaustion.
+ *            - -EOPNOTSUPP/-ENOTSUP : Raw tracepoint attachment not supported.
+ *            - Other -errno codes may be propagated from the underlying syscall.
+ *
+ * Error handling:
+ *   - Inspect the negative return value or errno for diagnostics.
+ *   - Treat -ENOENT as "tracepoint unavailable" (kernel config or version gap).
+ *
+ * After attachment:
+ *   - Optionally pin the FD (bpf_obj_pin()) if you need persistence.
+ *   - Use bpf_obj_get_info_by_fd() to query attachment metadata if supported.
+ */
 LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts);
 
+/**
+ * @brief Attach a loaded BPF program to a raw tracepoint (legacy/simple API).
+ *
+ * bpf_raw_tracepoint_open() is a convenience wrapper that issues the
+ * BPF_RAW_TRACEPOINT_OPEN command to attach the BPF program referenced
+ * by @p prog_fd to the raw tracepoint named @p name. On success it returns
+ * a file descriptor representing the attachment; closing that FD detaches
+ * the program from the tracepoint.
+ *
+ * Compared to bpf_raw_tracepoint_open_opts(), this legacy interface
+ * provides no ability to specify an attach cookie or future extension
+ * fields. For new code prefer bpf_raw_tracepoint_open_opts() to enable
+ * forward/backward compatible option passing.
+ *
+ * Tracepoint name:
+ *   - @p name must be a non-NULL, null-terminated string identifying a
+ *     raw tracepoint (e.g. "sched_switch").
+ *   - Pass the raw kernel tracepoint identifier without any category
+ *     prefix (do not include "tracepoint/" or directory components).
+ *   - If the tracepoint is not available (kernel config/version) the
+ *     call fails with -ENOENT.
+ *
+ * Program requirements:
+ *   - @p prog_fd must refer to a loaded BPF program of a type compatible
+ *     with raw tracepoint attachment (e.g., BPF_PROG_TYPE_RAW_TRACEPOINT
+ *     or an allowed tracing program type accepted by the kernel).
+ *   - The program may be safely closed after a successful attachment;
+ *     the returned FD controls the lifetime of the link.
+ *
+ * Lifetime & detachment:
+ *   - Each successful call creates a distinct attachment with its own FD.
+ *   - Closing the returned FD immediately detaches the program from the
+ *     tracepoint.
+ *   - The returned FD can be pinned (bpf_obj_pin()) for persistence.
+ *
+ * Concurrency:
+ *   - Multiple programs can be attached to the same raw tracepoint.
+ *   - Attach/detach operations are atomic; events after success invoke
+ *     the program until its FD is closed.
+ *
+ * Privileges & security:
+ *   - Typically requires CAP_BPF and/or CAP_SYS_ADMIN depending on
+ *     kernel configuration, LSM, and lockdown mode.
+ *   - Insufficient privilege yields -EPERM / -EACCES.
+ *
+ * Performance considerations:
+ *   - Raw tracepoints can be very frequent; ensure attached program
+ *     logic is efficient to avoid noticeable overhead.
+ *
+ * @param name    Null-terminated raw tracepoint name (e.g. "sched_switch").
+ * @param prog_fd File descriptor of a loaded, compatible BPF program.
+ *
+ * @return >= 0 : Attachment file descriptor (close to detach).
+ *         < 0  : Negative error code (libbpf style == -errno) on failure.
+ *
+ * Error handling (negative libbpf-style return value == -errno):
+ *   - -EINVAL   : Invalid @p prog_fd, NULL/empty @p name, incompatible program type.
+ *   - -ENOENT   : Tracepoint not found / unsupported by current kernel.
+ *   - -EPERM/-EACCES : Insufficient privileges or blocked by security policy.
+ *   - -EBADF    : @p prog_fd is not a valid file descriptor.
+ *   - -ENOMEM   : Kernel memory/resource exhaustion.
+ *   - -EOPNOTSUPP/-ENOTSUP : Raw tracepoints unsupported by the kernel.
+ *   - Other negative codes may be propagated from the underlying syscall.
+ *
+ * Best practices:
+ *   - Prefer bpf_raw_tracepoint_open_opts() for new development to
+ *     gain cookie support and extensibility.
+ *   - Immediately check the return value; do not rely solely on errno.
+ *   - Pin the attachment if you need persistence across process lifetimes.
+ *
+ */
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 
+/**
+ * @brief Query metadata about a file descriptor in another task (process) that
+ *        is associated with a BPF tracing/perf event and (optionally) an
+ *        attached BPF program.
+ *
+ * This helper wraps the kernel's BPF_TASK_FD_QUERY command. It inspects the
+ * file descriptor number @p fd that belongs to the task identified by @p pid
+ * and, if that FD represents a perf event or similar tracing attachment, it
+ * returns descriptive information about:
+ *   - The attached BPF program (its kernel program ID).
+ *   - The nature/type of the FD (tracepoint, raw_tracepoint, kprobe, uprobe, etc.).
+ *   - Target symbol/address/offset data for kprobe/uprobes.
+ *   - A human‑readable identifier (tracepoint name, kprobe function name,
+ *     uprobe file path), copied into @p buf when provided.
+ *
+ * Typical use cases:
+ *   - Introspecting perf event FDs opened by another process to discover
+ *     which BPF program is attached.
+ *   - Enumerating and characterizing dynamically created kprobes or uprobes
+ *     (e.g., by observability agents).
+ *   - Building higher-level tooling that correlates program IDs with their
+ *     originating probe specifications.
+ *
+ * Usage pattern:
+ *   char info[256];
+ *   __u32 info_len = sizeof(info);
+ *   __u32 prog_id = 0, fd_type = 0;
+ *   __u64 probe_off = 0, probe_addr = 0;
+ *   int err = bpf_task_fd_query(target_pid, target_fd, 0,
+ *                               info, &info_len,
+ *                               &prog_id, &fd_type,
+ *                               &probe_off, &probe_addr);
+ *   if (err == 0) {
+ *       // info[] now holds a NUL-terminated identifier (if available)
+ *       // info_len == actual length (including terminating '\0')
+ *       // fd_type enumerates one of BPF_FD_TYPE_* values
+ *       // prog_id is the kernel-assigned BPF program ID (0 if none)
+ *       // probe_off / probe_addr describe offsets/addresses for kprobe/uprobe
+ *   } else if (err == -ENOSPC) {
+ *       // info_len contains required size; allocate larger buffer and retry
+ *   }
+ *
+ * Buffer semantics (@p buf / @p buf_len):
+ *   - On input @p *buf_len must hold the capacity (in bytes) of @p buf.
+ *   - If @p buf is large enough, the kernel copies a NUL‑terminated string
+ *     (tracepoint name, kprobe symbol, uprobe path, etc.) and updates
+ *     @p *buf_len with the actual string length (including the NUL).
+ *   - If @p buf is too small, the call fails with -ENOSPC and sets
+ *     @p *buf_len to the required length; reallocate and retry.
+ *   - If a textual identifier is not applicable (or unavailable), the kernel
+ *     may set @p *buf_len to 0 (and leave @p buf untouched).
+ *   - Passing @p buf == NULL is allowed only if @p buf_len is non-NULL and
+ *     points to 0; otherwise -EINVAL is returned.
+ *
+ * Output parameters:
+ *   - @p prog_id: Set to the kernel BPF program ID attached to the perf event
+ *     FD (0 if no BPF program is attached).
+ *   - @p fd_type: Set to one of the BPF_FD_TYPE_* enum values describing the
+ *     FD (e.g., BPF_FD_TYPE_TRACEPOINT, BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_UPROBE,
+ *     BPF_FD_TYPE_RAW_TRACEPOINT). Use this to disambiguate interpretation of
+ *     other outputs.
+ *   - @p probe_offset: For kprobe/uprobes, the offset within the symbol or
+ *     mapped file that was requested when the probe was created.
+ *   - @p probe_addr: For kprobes, the resolved kernel address of the probed
+ *     symbol/instruction; for uprobes may be 0 or implementation-dependent.
+ *   - Any output pointer may be NULL if the caller is not interested in that
+ *     field (it will simply be skipped).
+ *
+ * Privileges & access control:
+ *   - Querying another task's file descriptor typically requires sufficient
+ *     permissions (ptrace-like restrictions, CAP_BPF / CAP_SYS_ADMIN, and/or
+ *     LSM allowances). Lack of privilege yields -EPERM / -EACCES.
+ *   - The target task must exist and the FD must be valid at query time.
+ *
+ * Concurrency / races:
+ *   - The target process may close or replace its FD concurrently; the query
+ *     can fail with -EBADF or -ENOENT in such races.
+ *   - Retrieved metadata is a point-in-time snapshot and can become stale
+ *     immediately after return.
+ *
+ * @param pid          PID of the target task whose file descriptor table should be queried.
+ *                     Use the numeric PID (thread group leader or specific thread PID);
+ *                     passing 0 is typically invalid (returns -EINVAL).
+ * @param fd           File descriptor number as seen from inside the task identified by @p pid.
+ * @param flags        Query modifier flags. Must be 0 on current kernels; non‑zero
+ *                     (unsupported) bits return -EINVAL.
+ * @param buf          Optional user buffer to receive a NUL‑terminated identifier string
+ *                     (tracepoint name, kprobe symbol, uprobe path). Can be NULL if
+ *                     @p buf_len points to 0.
+ * @param buf_len      In/out pointer to buffer length. On input: capacity of @p buf.
+ *                     On success: actual length copied (including terminating NUL).
+ *                     On -ENOSPC: required length (caller should reallocate and retry).
+ * @param prog_id      Optional output pointer receiving the attached BPF program ID (0 if none).
+ * @param fd_type      Optional output pointer receiving one of BPF_FD_TYPE_* constants identifying FD type.
+ * @param probe_offset Optional output pointer receiving the probe offset (for kprobe/uprobe types).
+ * @param probe_addr   Optional output pointer receiving resolved kernel address (kprobe) or relevant mapping address.
+ *
+ * @return 0 on success;
+ *         Negative libbpf-style error code (< 0) on failure:
+ *           - -EINVAL  : Invalid arguments (bad pid/fd, unsupported flags, inconsistent buf/buf_len).
+ *           - -ENOENT  : Task, file descriptor, or associated probe/program not found.
+ *           - -EBADF   : Bad file descriptor in target task at time of query.
+ *           - -ENOSPC  : @p buf too small; @p *buf_len updated with required size.
+ *           - -EPERM / -EACCES : Insufficient privileges or access denied by security policy.
+ *           - -EFAULT  : User memory (buf or buf_len or an output pointer) not accessible.
+ *           - -ENOMEM  : Temporary kernel memory/resource exhaustion.
+ *           - Other -errno codes may be propagated from the underlying syscall.
+ *
+ * Best practices:
+ *   - Initialize *buf_len with the size of your buffer; handle -ENOSPC by allocating
+ *     a larger buffer using the returned required length.
+ *   - Check @p fd_type first to interpret @p probe_offset / @p probe_addr meaningfully.
+ *   - Treat -ENOENT and -EBADF as normal race outcomes in dynamic environments.
+ *   - Avoid querying extremely frequently in production paths; this is introspective
+ *     debug/management tooling, not a fast data path primitive.
+ *
+ * Thread safety:
+ *   - This helper is thread-safe; multiple threads can query different (or the same)
+ *     tasks concurrently. Returned data structures are per-call (no shared state).
+ */
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
 				 __u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
 				 __u64 *probe_offset, __u64 *probe_addr);
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ