linux-kernel - Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault to BPF stderr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0adc5d8a299483004f4796a418420fe1c69f24bc.camel@gmail.com>
Date: Wed, 08 Oct 2025 12:34:08 -0700
From: Eduard Zingerman <eddyz87@...il.com>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>, Alexei Starovoitov
	 <alexei.starovoitov@...il.com>
Cc: Leon Hwang <hffilwlqm@...il.com>, Andrii Nakryiko <andrii@...nel.org>, 
 Menglong Dong <menglong.dong@...ux.dev>, Menglong Dong
 <menglong8.dong@...il.com>, Alexei Starovoitov	 <ast@...nel.org>, bpf
 <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
 linux-trace-kernel <linux-trace-kernel@...r.kernel.org>,
 jiang.biao@...ux.dev
Subject: Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe
 fault to BPF stderr

On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > 
> > On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@...il.com> wrote:
> > > 
> > > 
> > > 
> > > On 2025/10/7 14:14, Menglong Dong wrote:
> > > > On 2025/10/2 10:03, Alexei Starovoitov wrote:
> > > > > On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@...il.com> wrote:
> > > > > > 
> > > > > > Introduce the function bpf_prog_report_probe_violation(), which is used
> > > > > > to report the memory probe fault to the user by the BPF stderr.
> > > > > > 
> > > > > > Signed-off-by: Menglong Dong <menglong.dong@...ux.dev>
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > Interesting idea, but the above message is not helpful.
> > > > > Users cannot decipher a fault_ip within a bpf prog.
> > > > > It's just a random number.
> > > > 
> > > > Yeah, I have noticed this too. What useful is the
> > > > bpf_stream_dump_stack(), which will print the code
> > > > line that trigger the fault.
> > > > 
> > > > > But stepping back... just faults are common in tracing.
> > > > > If we start printing them we will just fill the stream to the max,
> > > > > but users won't know that the message is there, since no one
> > > > 
> > > > You are right, we definitely can't output this message
> > > > to STDERR directly. We can add an extra flag for it, as you
> > > > said below.
> > > > 
> > > > Or, maybe we can introduce a enum stream_type, and
> > > > the users can subscribe what kind of messages they
> > > > want to receive.
> > > > 
> > > > > expects it. arena and lock errors are rare and arena faults
> > > > > were specifically requested by folks who develop progs that use arena.
> > > > > This one is different. These faults have been around for a long time
> > > > > and I don't recall people asking for more verbosity.
> > > > > We can add them with an extra flag specified at prog load time,
> > > > > but even then. Doesn't feel that useful.
> > > > 
> > > > Generally speaking, users can do invalid checking before
> > > > they do the memory reading, such as NULL checking. And
> > > > the pointer in function arguments that we hook is initialized
> > > > in most case. So the fault is someting that can be prevented.
> > > > 
> > > > I have a BPF tools which is writed for 4.X kernel and kprobe
> > > > based BPF is used. Now I'm planing to migrate it to 6.X kernel
> > > > and replace bpf_probe_read_kernel() with bpf_core_cast() to
> > > > obtain better performance. Then I find that I can't check if the
> > > > memory reading is success, which can lead to potential risk.
> > > > So my tool will be happy to get such fault event :)
> > > > 
> > > > Leon suggested to add a global errno for each BPF programs,
> > > > and I haven't dig deeply on this idea yet.
> > > > 
> > > 
> > > Yeah, as we discussed, a global errno would be a much more lightweight
> > > approach for handling such faults.
> > > 
> > > The idea would look like this:
> > > 
> > > DEFINE_PER_CPU(int, bpf_errno);
> > > 
> > > __bpf_kfunc void bpf_errno_clear(void);
> > > __bpf_kfunc void bpf_errno_set(int errno);
> > > __bpf_kfunc int bpf_errno_get(void);
> > > 
> > > When a fault occurs, the kernel can simply call
> > > 'bpf_errno_set(-EFAULT);'.
> > > 
> > > If users want to detect whether a fault happened, they can do:
> > > 
> > > bpf_errno_clear();
> > > header = READ_ONCE(skb->network_header);
> > > if (header == 0 && bpf_errno_get() == -EFAULT)
> > >         /* handle fault */;
> > > 
> > > This way, users can identify faults immediately and handle them gracefully.
> > > 
> > > Furthermore, these kfuncs can be inlined by the verifier, so there would
> > > be no runtime function call overhead.
> > 
> > Interesting idea, but errno as-is doesn't quite fit,
> > since we only have 2 (or 3 ?) cases without explicit error return:
> > probe_read_kernel above, arena read, arena write.
> > I guess we can add may_goto to this set as well.
> > But in all these cases we'll struggle to find an appropriate errno code,
> > so it probably should be a custom enum and not called "errno".
> 
> Yeah, agreed that this would be useful, particularly in this case. I'm
> wondering how we'll end up implementing this.
> Sounds like it needs to be tied to the program's invocation, so it
> cannot be per-cpu per-program, since they nest. Most likely should be
> backed by run_ctx, but that is unavailable in all program types. Next
> best thing that comes to mind is reserving some space in the stack
> frame at a known offset in each subprog that invokes this helper, and
> use that to signal (by finding the program's bp and writing to the
> stack), the downside being it likely becomes yet-another arch-specific
> thing. Any other better ideas?

Another option is to lower probe_read to a BPF_PROBE_MEM instruction
and generate a special kind of exception handler, that would set r0 to
-EFAULT. (We don't do this already, right? Don't see anything like that
in verifier.c or x86/../bpf_jit_comp.c).

This would avoid any user-visible changes and address performance
concern. Not so convenient for a chain of dereferences a->b->c->d,
though.