linux-kernel - Re: WARNING in tracepoint_probe_register

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6d3f6847-b368-37f1-b7c5-2abbba074235@kernel.dk>
Date:   Tue, 31 Oct 2017 16:30:41 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Steven Rostedt <rostedt@...dmis.org>,
        syzbot 
        <bot+2acf619eef45c8f8b322a24d8e528eccbe6319bc@...kaller.appspotmail.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        syzkaller-bugs@...glegroups.com,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: WARNING in tracepoint_probe_register_prio

On 10/31/2017 04:21 PM, Steven Rostedt wrote:
> On Sun, 29 Oct 2017 14:24:01 -0700
> syzbot <bot+2acf619eef45c8f8b322a24d8e528eccbe6319bc@...kaller.appspotmail.com> wrote:
> 
>> syzkaller has found reproducer for the following crash on  
>> 36ef71cae353f88fd6e095e2aaa3e5953af1685d
> 
> So this fuzzer triggers this.
> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> WARNING: CPU: 0 PID: 2995 at kernel/tracepoint.c:210 tracepoint_add_func  
>> kernel/tracepoint.c:210 [inline]
> 
> Which is this:
> 
> 	old = func_add(&tp_funcs, func, prio);
> 	if (IS_ERR(old)) {
> 		WARN_ON_ONCE(1);
> 		return PTR_ERR(old);
> 	}
> 
> Which means that func_add() returned a warning.
> 
> The possible warnings are:
> 
> 	if (WARN_ON(!tp_func->func))
> 		return ERR_PTR(-EINVAL);
> 
> 			if (old[nr_probes].func == tp_func->func &&
> 			    old[nr_probes].data == tp_func->data)
> 				return ERR_PTR(-EEXIST);
> 
> 	new = allocate_probes(nr_probes + 2);
> 	if (new == NULL)
> 		return ERR_PTR(-ENOMEM);
> 
> Which means that either the tp_func->func didn't have a function. I
> doubt that from the back trace, it is just the registering of the
> tracepoints which should always have func set.
> 
> I doubt we ran out of memory here. Possible, but unlikely (this was
> reproduced twice with the same path).
> 
> Which leaves us with a tracepoint that was registered twice.
> 
> There's nothing in register_trace_*() that protects it. The blktrace
> code needs to have some protection to know if it registered the
> tracepoints once, otherwise this will trigger.
> 
> 
>> WARNING: CPU: 0 PID: 2995 at kernel/tracepoint.c:210  
>> tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
>> Kernel panic - not syncing: panic_on_warn set ...
> 
> It panics because "panic_on_warn" is set and we just did a warning.
> 
>>
>> CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted  
>> 4.14.0-rc5-next-20171018+ #36
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
>> Google 01/01/2011
>> Call Trace:
>>   __dump_stack lib/dump_stack.c:16 [inline]
>>   dump_stack+0x194/0x257 lib/dump_stack.c:52
>>   panic+0x1e4/0x41c kernel/panic.c:183
>>   __warn+0x1c4/0x1e0 kernel/panic.c:546
>>   report_bug+0x211/0x2d0 lib/bug.c:183
>>   fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
>>   do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
>>   do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
>>   do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
>>   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
>>   invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
>> RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
>> RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293
>> RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07
>> RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818
>> RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004
>> R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0
>> R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0
>>   tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304
>>   register_trace_block_rq_insert include/trace/events/block.h:191 [inline]
>>   blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043
>>   do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542
>>   blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564
> 
> I'm guessing the blk_trace_setup should have a mutex or something to
> protect this. Atomic counters is not enough.
> 
> 
> 	CPU0					CPU1
> 	----					----
>  if (atomic_inc_return(&blk_probes_ref) == 1)
> 	blk_register_tracepoints();
> 
>  if (atomic_dec_and_test(&blk_probes_ref))
> 
> 				if (atomic_inc_return(&blk_probes_ref) == 1)
> 					blk_register_tracepoints();
> 
> 	blk_unregister_tracepoints();
> 
> Would cause this to trigger.

This code dates back to:

commit c71a896154119f4ca9e89d6078f5f63ad60ef199
Author: Arnaldo Carvalho de Melo <acme@...hat.com>
Date:   Fri Jan 23 12:06:27 2009 -0200

    blktrace: add ftrace plugin

so not really a recent regression :-)

I'll take a look at it.

-- 
Jens Axboe