linux-kernel - Re: tracing: user events UAF crash report

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD2QZ9bTrQ1p3zTZOXe6Gk4Xq8kjYSziAYAdbTrvRSZzAGPY9A@mail.gmail.com>
Date: Thu, 25 Jul 2024 21:45:03 +0530
From: Ajay Kaher <ajay.kaher@...adcom.com>
To: Mathias Krause <minipli@...ecurity.net>, Steven Rostedt <rostedt@...dmis.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, Ilkka Naulapää <digirigawa@...il.com>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Al Viro <viro@...iv.linux.org.uk>, 
	linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	regressions@...mhuis.info, Dan Carpenter <dan.carpenter@...aro.org>, 
	Vasavi Sirnapalli <vasavi.sirnapalli@...adcom.com>, 
	Alexey Makhalov <alexey.makhalov@...adcom.com>, 
	Florian Fainelli <florian.fainelli@...adcom.com>
Subject: Re: tracing: user events UAF crash report

On Mon, Jul 22, 2024 at 5:38 PM Mathias Krause <minipli@...ecurity.net> wrote:
>
> On 22.07.24 13:13, Ajay Kaher wrote:
> > On Sat, Jul 20, 2024 at 2:17 AM Mathias Krause <minipli@...ecurity.net> wrote:
> >>
> >> I noticed, the user events ftrace selftest is crashing every now and
> >> then in our automated tests. Digging into, I found that the following
> >> is triggering the issue very reliable:
> >>
> >> - in one shell, as root:
> >>   # while true; do ./kselftest/user_events/ftrace_test; done
> >>
> >> - in a second shell, again as root:
> >>   # cd /sys/kernel/tracing
> >>   # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
> >>
> >
> > Tried to reproduced on 6.10.0-rc7-100.ph5+, only getting repeated output as:

< sending again after correcting alignments >

Mathias, thanks for reporting. I am able to reproduce the 'KASAN:
slab-use-after-free'.

Steve, let me know if anything wrong in my investigation:

[ 6264.339882] ==================================================================
[ 6264.339970] BUG: KASAN: slab-use-after-free in f_start+0x2b5/0x370

This belongs to  f_start() ->  f_next() -> trace_get_fields():

    trace_get_fields(struct trace_event_call *event_call)
    {
        if (!event_call->class->get_fields)
        return &event_call->class->fields;
        return event_call->class->get_fields(event_call);
    }

This happens while reading 'events/user_events/__test_event/format'.


Allocation:
[ 6264.347212] Allocated by task 3287:
[ 6264.348247]  kasan_save_stack+0x26/0x50
[ 6264.348256]  kasan_save_track+0x14/0x40
[ 6264.348260]  kasan_save_alloc_info+0x37/0x50
[ 6264.348265]  __kasan_kmalloc+0xb3/0xc0
[ 6264.348268]  kmalloc_trace_noprof+0x168/0x330
[ 6264.348280]  user_event_parse_cmd+0x57b/0x26c0
[ 6264.348286]  user_events_ioctl+0xa92/0x1850
[ 6264.348290]  __x64_sys_ioctl+0x138/0x1b0
[ 6264.348295]  x64_sys_call+0x9a4/0x1f20
[ 6264.348299]  do_syscall_64+0x4b/0x110

user_event_parse_cmd() -> user_event_parse() {
    .
    user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);

Link: https://elixir.bootlin.com/linux/v6.10/source/kernel/trace/trace_events_user.c#L2118


Freed:
[ 6264.350333]  kfree+0xd1/0x2b0
[ 6264.350337]  destroy_user_event.part.0+0x313/0x450
[ 6264.350341]  destroy_user_event+0x129/0x1a0
[ 6264.350344]  delayed_destroy_user_event+0x62/0xd0
[ 6264.350347]  process_one_work+0x621/0xf60
[ 6264.350359]  worker_thread+0x760/0x14f0

static int destroy_user_event(struct user_event *user) {
    .
    kfree(user->call.print_fmt);
    kfree(EVENT_NAME(user));
    kfree(user);  <--

Link: https://elixir.bootlin.com/linux/v6.10/source/kernel/trace/trace_events_user.c#L1510


Race condition:

Thread A i.e. event reader able to reach the f_start() as the path is
valid. Thread A waiting for lock. At the sametime, Thread B has
acquired lock and removing events entry followed by free the
user_event object. Later once Thread A got the lock it tried to read
address which belongs to struct trace_event_call (struct
trace_event_call is member of struct user_event)

Thread A (read event)                      Thread B (remove event)

.                                                        worker_thread()
.
delayed_destroy_user_event()
.                                                               ->
acquire event_mutex
.                                                        destroy_user_event()
vfs_read()                                         .
seq_read()                                        .
f_start() -> acquire event_mutex      eventfs_remove_dir()
. (waiting)                                          kfree(user)
. (waiting)                                          -> released event_mutex
acquired event_mutex
f_next()
trace_get_fields():

I think you have added the following check in f_start() to prevent
this race condition, but somehow with eventfs still some gap to race condition.

static void *f_start(struct seq_file *m, loff_t *pos) {
    mutex_lock(&event_mutex);
    if (!event_file_data(m->private))   <--
        return ERR_PTR(-ENODEV);

-Ajay