lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <29c58126-3146-4c61-8166-a894c0e84d08@efficios.com>
Date: Tue, 22 Oct 2024 11:19:01 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>, Jordan Rife <jrife@...gle.com>
Cc: syzbot+b390c8062d8387b6272a@...kaller.appspotmail.com, andrii@...nel.org,
 ast@...nel.org, bpf@...r.kernel.org, daniel@...earbox.net,
 eddyz87@...il.com, haoluo@...gle.com, john.fastabend@...il.com,
 jolsa@...nel.org, kpsingh@...nel.org, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, martin.lau@...ux.dev,
 mattbobrowski@...gle.com, mhiramat@...nel.org, sdf@...ichev.me,
 song@...nel.org, syzkaller-bugs@...glegroups.com, yonghong.song@...ux.dev
Subject: Re: [syzbot] [trace?] [bpf?] KASAN: slab-use-after-free Read in
 bpf_trace_run2 (2)

On 2024-10-22 04:20, Steven Rostedt wrote:
> 
> Mathieu, can you look at this?

Sure,

> 
> [ more below ]
> 
> On Mon, 21 Oct 2024 18:23:47 +0000
> Jordan Rife <jrife@...gle.com> wrote:
> 
>> I performed a bisection and this issue starts with commit a363d27cdbc2
>> ("tracing: Allow system call tracepoints to handle page faults") which
>> introduces this change.
>>
>>> + *
>>> + * With @syscall=0, the tracepoint callback array dereference is
>>> + * protected by disabling preemption.
>>> + * With @syscall=1, the tracepoint callback array dereference is
>>> + * protected by Tasks Trace RCU, which allows probes to handle page
>>> + * faults.
>>>    */
>>>   #define __DO_TRACE(name, args, cond, syscall)				\
>>>   	do {								\
>>> @@ -204,11 +212,17 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>>>   		if (!(cond))						\
>>>   			return;						\
>>>   									\
>>> -		preempt_disable_notrace();				\
>>> +		if (syscall)						\
>>> +			rcu_read_lock_trace();				\
>>> +		else							\
>>> +			preempt_disable_notrace();			\
>>>   									\
>>>   		__DO_TRACE_CALL(name, TP_ARGS(args));			\
>>>   									\
>>> -		preempt_enable_notrace();				\
>>> +		if (syscall)						\
>>> +			rcu_read_unlock_trace();			\
>>> +		else							\
>>> +			preempt_enable_notrace();			\
>>>   	} while (0)
>>
>> Link: https://lore.kernel.org/bpf/20241009010718.2050182-6-mathieu.desnoyers@efficios.com/
>>
>> I reproduced the bug locally by running syz-execprog inside a QEMU VM.
>>
>>> ./syz-execprog -repeat=0 -procs=5 ./repro.syz.txt
>>
>> I /think/ what is happening is that with this change preemption may now
>> occur leading to a scenario where the RCU grace period is insufficient
>> in a few places where call_rcu() is used. In other words, there are a
>> few scenarios where call_rcu_tasks_trace() should be used instead to
>> prevent a use-after-free bug when a preempted tracepoint call tries to
>> access a program, link, etc. that was freed. It seems the syzkaller
>> program induces page faults while attaching raw tracepoints to
>> sys_enter making preemption more likely to occur.
>>
>> kernel/tracepoint.c
>> ===================
>>> ...
>>> static inline void release_probes(struct tracepoint_func *old)
>>> {
>>> 	...
>>> 	call_rcu(&tp_probes->rcu, rcu_free_old_probes); <-- Here
> 
> Have you tried just changing this one to call_rcu_tasks_trace()?

Actually, I see two possible solutions there:

1) If we want to keep unchanged register/unregister functions as a single
    API for normal and syscall tracepoints, and we don't want to add additional
    flags in the tracepoint struct, then we need to chain the call_rcu:

    call_rcu() -> rcu_free_old_probes -> call_rcu_tasks_trace() -> rcu_tasks_trace_free_old_probes.

    Just like we did when we used SRCU.

    This is not perfect because we'd be adding extra delay for reclaim of
    the old probes array for every tracepoint being updated, not just the
    syscall tracepoints, but we probably don't care. This is a straightforward
    initial solution.

    We did something similar with SRCU before and it was OK.

2) If we want something more elegant, we can add a flag to struct tracepoint
    for syscall tracepoints, and use that flag to choose between call_rcu()
    and call_rcu_tasks_trace(). But maybe this additional complexity is not
    even useful.

I'll prepare a patch implementing (1) and send it your way as RFC for further
testing.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ