[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5766a834-3b21-47b0-8793-2673c25ab6b0@gmail.com>
Date: Thu, 9 Oct 2025 22:29:04 +0800
From: Leon Hwang <hffilwlqm@...il.com>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>,
Eduard Zingerman <eddyz87@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Andrii Nakryiko <andrii@...nel.org>, Menglong Dong
<menglong.dong@...ux.dev>, Menglong Dong <menglong8.dong@...il.com>,
Alexei Starovoitov <ast@...nel.org>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-trace-kernel <linux-trace-kernel@...r.kernel.org>, jiang.biao@...ux.dev
Subject: Re: bpf_errno. Was: [PATCH RFC bpf-next 1/3] bpf: report probe fault
to BPF stderr
On 2025/10/9 04:08, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Oct 2025 at 21:34, Eduard Zingerman <eddyz87@...il.com> wrote:
>>
>> On Wed, 2025-10-08 at 19:08 +0200, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, 8 Oct 2025 at 18:27, Alexei Starovoitov
>>> <alexei.starovoitov@...il.com> wrote:
>>>>
>>>> On Wed, Oct 8, 2025 at 7:41 AM Leon Hwang <hffilwlqm@...il.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2025/10/7 14:14, Menglong Dong wrote:
>>>>>> On 2025/10/2 10:03, Alexei Starovoitov wrote:
>>>>>>> On Fri, Sep 26, 2025 at 11:12 PM Menglong Dong <menglong8.dong@...il.com> wrote:
>>>>>>>>
>>>>>>>> Introduce the function bpf_prog_report_probe_violation(), which is used
>>>>>>>> to report the memory probe fault to the user by the BPF stderr.
>>>>>>>>
>>>>>>>> Signed-off-by: Menglong Dong <menglong.dong@...ux.dev>
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>>> Interesting idea, but the above message is not helpful.
>>>>>>> Users cannot decipher a fault_ip within a bpf prog.
>>>>>>> It's just a random number.
>>>>>>
>>>>>> Yeah, I have noticed this too. What useful is the
>>>>>> bpf_stream_dump_stack(), which will print the code
>>>>>> line that trigger the fault.
>>>>>>
>>>>>>> But stepping back... just faults are common in tracing.
>>>>>>> If we start printing them we will just fill the stream to the max,
>>>>>>> but users won't know that the message is there, since no one
>>>>>>
>>>>>> You are right, we definitely can't output this message
>>>>>> to STDERR directly. We can add an extra flag for it, as you
>>>>>> said below.
>>>>>>
>>>>>> Or, maybe we can introduce a enum stream_type, and
>>>>>> the users can subscribe what kind of messages they
>>>>>> want to receive.
>>>>>>
>>>>>>> expects it. arena and lock errors are rare and arena faults
>>>>>>> were specifically requested by folks who develop progs that use arena.
>>>>>>> This one is different. These faults have been around for a long time
>>>>>>> and I don't recall people asking for more verbosity.
>>>>>>> We can add them with an extra flag specified at prog load time,
>>>>>>> but even then. Doesn't feel that useful.
>>>>>>
>>>>>> Generally speaking, users can do invalid checking before
>>>>>> they do the memory reading, such as NULL checking. And
>>>>>> the pointer in function arguments that we hook is initialized
>>>>>> in most case. So the fault is someting that can be prevented.
>>>>>>
>>>>>> I have a BPF tools which is writed for 4.X kernel and kprobe
>>>>>> based BPF is used. Now I'm planing to migrate it to 6.X kernel
>>>>>> and replace bpf_probe_read_kernel() with bpf_core_cast() to
>>>>>> obtain better performance. Then I find that I can't check if the
>>>>>> memory reading is success, which can lead to potential risk.
>>>>>> So my tool will be happy to get such fault event :)
>>>>>>
>>>>>> Leon suggested to add a global errno for each BPF programs,
>>>>>> and I haven't dig deeply on this idea yet.
>>>>>>
>>>>>
>>>>> Yeah, as we discussed, a global errno would be a much more lightweight
>>>>> approach for handling such faults.
>>>>>
>>>>> The idea would look like this:
>>>>>
>>>>> DEFINE_PER_CPU(int, bpf_errno);
>>>>>
>>>>> __bpf_kfunc void bpf_errno_clear(void);
>>>>> __bpf_kfunc void bpf_errno_set(int errno);
>>>>> __bpf_kfunc int bpf_errno_get(void);
>>>>>
>>>>> When a fault occurs, the kernel can simply call
>>>>> 'bpf_errno_set(-EFAULT);'.
>>>>>
>>>>> If users want to detect whether a fault happened, they can do:
>>>>>
>>>>> bpf_errno_clear();
>>>>> header = READ_ONCE(skb->network_header);
>>>>> if (header == 0 && bpf_errno_get() == -EFAULT)
>>>>> /* handle fault */;
>>>>>
>>>>> This way, users can identify faults immediately and handle them gracefully.
>>>>>
>>>>> Furthermore, these kfuncs can be inlined by the verifier, so there would
>>>>> be no runtime function call overhead.
>>>>
>>>> Interesting idea, but errno as-is doesn't quite fit,
>>>> since we only have 2 (or 3 ?) cases without explicit error return:
>>>> probe_read_kernel above, arena read, arena write.
>>>> I guess we can add may_goto to this set as well.
>>>> But in all these cases we'll struggle to find an appropriate errno code,
>>>> so it probably should be a custom enum and not called "errno".
>>>
>>> Yeah, agreed that this would be useful, particularly in this case. I'm
>>> wondering how we'll end up implementing this.
>>> Sounds like it needs to be tied to the program's invocation, so it
>>> cannot be per-cpu per-program, since they nest. Most likely should be
>>> backed by run_ctx, but that is unavailable in all program types. Next
>>> best thing that comes to mind is reserving some space in the stack
>>> frame at a known offset in each subprog that invokes this helper, and
>>> use that to signal (by finding the program's bp and writing to the
>>> stack), the downside being it likely becomes yet-another arch-specific
>>> thing. Any other better ideas?
>>
>> Another option is to lower probe_read to a BPF_PROBE_MEM instruction
>> and generate a special kind of exception handler, that would set r0 to
>> -EFAULT. (We don't do this already, right? Don't see anything like that
>> in verifier.c or x86/../bpf_jit_comp.c).
>>
>> This would avoid any user-visible changes and address performance
>> concern. Not so convenient for a chain of dereferences a->b->c->d,
>> though.
>
> Since we're piling on ideas, one of the other things that I think
> could be useful in general (and maybe should be done orthogonally to
> bpf_errno)
> is making some empty nop function and making it not traceable reliably
> across arches and invoke it in the bpf exception handler.
No new traceable function is needed, since ex_handler_bpf itself can
already be traced via fentry.
If users really want to detect whether a fault occurred, they could
attach a program to ex_handler_bpf and record fault events into a map.
However, this approach would be too heavyweight just to check for a
simple fault condition.
Thanks,
Leon
> Then if we expose prog_stream_dump_stack() as a kfunc (should be
> trivial), the user can write anything to stderr that is relevant to
> get more information on the fault.
>
> It is then up to the user to decide the rate of messages for such
> faults etc. and get more information if needed.
Powered by blists - more mailing lists