[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17a9444f-1151-44b4-b2ec-5cafd12bf2bd@linux.dev>
Date: Thu, 13 Nov 2025 21:16:32 +0800
From: Leon Hwang <leon.hwang@...ux.dev>
To: Yonghong Song <yonghong.song@...ux.dev>, bpf@...r.kernel.org
Cc: ast@...nel.org, andrii@...nel.org, daniel@...earbox.net,
martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, ameryhung@...il.com,
linux-kernel@...r.kernel.org, kernel-patches-bot@...com
Subject: Re: [PATCH bpf-next v6 2/2] selftests/bpf: Add test to verify freeing
the special fields when update [lru_,]percpu_hash maps
On 2025/11/12 05:58, Yonghong Song wrote:
>
>
> On 11/11/25 5:52 AM, Leon Hwang wrote:
>>
>> On 2025/11/11 21:38, Leon Hwang wrote:
>>>
>>> On 2025/11/7 10:00, Yonghong Song wrote:
>>>>
>>>> On 11/5/25 7:14 AM, Leon Hwang wrote:
[...]
>>>>> +
>>>>> +static int __insert_in_list(struct bpf_list_head *head, struct
>>>>> bpf_spin_lock *lock,
>>>>> + struct node_data __kptr **node)
>>>>> +{
>>>>> + struct node_data *node_new, *node_ref, *node_old;
>>>>> +
>>>>> + node_new = bpf_obj_new(typeof(*node_new));
>>>>> + if (!node_new)
>>>>> + return -1;
>>>>> +
>>>>> + node_ref = bpf_refcount_acquire(node_new);
>>>>> + node_old = bpf_kptr_xchg(node, node_new);
>>>> Change the above to node_old = bpf_kptr_xchg(node, node_node_ref);
>>>> might
>>>> be better for reasoning although node_ref/node_new are the same.
>>>>
>>> Nope — node_ref and node_new are different for the verifier.
>> They are the same in theory.
>>
>> The verifier failure was likely caused by something else, but I'm not
>> sure of the exact reason.
>
> I did some analysis and your code works as expected:
>
> node_ref = bpf_refcount_acquire(node_new);
> node_old = bpf_kptr_xchg(node, node_new);
> if (node_old) {
> bpf_obj_drop(node_old);
> bpf_obj_drop(node_ref);
> return -2;
> }
>
> bpf_spin_lock(lock);
> bpf_list_push_front(head, &node_ref->l);
> ref = (u64)(void *) &node_ref->ref;
> bpf_spin_unlock(lock);
>
> In the above, after the following insn:
> node_old = bpf_kptr_xchg(node, node_new);
> the second argument 'node_new' will become a scalar since it
> may be changed by another bpf program accessing the same map.
>
> So your code is okay as node_ref still valid ptr_node_data
> and can be used in following codes.
>
>
> My suggestion to replace
> node_old = bpf_kptr_xchg(node, node_new);
> with
> node_old = bpf_kptr_xchg(node, node_ref);
> will not work since node_ref will be a scalar
> so subsequent bpf_obj_drop(node_ref) and bpf_list_push_front(...)
> won't work.
>
> In summary, your change look okay to me. Sorry for noise.
>
Hi Yonghong,
Thanks for your detailed analysis.
Indeed, in the case of
node_old = bpf_kptr_xchg(node, node_ref);,
the verifier marks node_ref as SCALAR_VALUE.
Here's the relevant part in check_helper_call():
static int check_helper_call(struct bpf_verifier_env *env, struct
bpf_insn *insn,
int *insn_idx_p)
{
//...
if (meta.release_regno) {
err = -EINVAL;
if (arg_type_is_dynptr(fn->arg_type[meta.release_regno -
BPF_REG_1])) {
err = unmark_stack_slots_dynptr(env,
®s[meta.release_regno]);
} else if (func_id == BPF_FUNC_kptr_xchg &&
meta.ref_obj_id) {
u32 ref_obj_id = meta.ref_obj_id;
bool in_rcu = in_rcu_cs(env);
struct bpf_func_state *state;
struct bpf_reg_state *reg;
err = release_reference_nomark(env->cur_state,
ref_obj_id);
if (!err) {
bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
if (reg->ref_obj_id == ref_obj_id) {
if (in_rcu && (reg->type
& MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
reg->ref_obj_id = 0;
reg->type &=
~MEM_ALLOC;
reg->type |=
MEM_RCU;
} else {
mark_reg_invalid(env, reg);
}
}
}));
}
} else if (meta.ref_obj_id) {
//...
}
//...
}
This logic matches your explanation — when the argument passed to
bpf_kptr_xchg() holds a reference (meta.ref_obj_id), that reference is
not a KPTR_PERCPU inside an RCU critical section.
As a result, node_ref is invalidated and becomes a scalar after the call.
So yes, your reasoning is correct, and this behavior explains why using
node_ref as the second argument doesn't work.
Thanks,
Leon
[...]
Powered by blists - more mailing lists