linux-kernel - Re: [PATCH v2] tracing: Add trace_trigger kernel command line option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5J/CajlNh1gexvo@google.com>
Date:   Thu, 8 Dec 2022 17:19:21 -0700
From:   Ross Zwisler <zwisler@...gle.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        paulmck@...nel.org, Joel Fernandes <joel@...lfernandes.org>,
        Tom Zanussi <zanussi@...nel.org>
Subject: Re: [PATCH v2] tracing: Add trace_trigger kernel command line option

On Thu, Dec 08, 2022 at 06:39:45PM -0500, Steven Rostedt wrote:
> On Thu, 8 Dec 2022 15:27:07 -0700
> Ross Zwisler <zwisler@...gle.com> wrote:
> 
> > > +#ifdef CONFIG_HIST_TRIGGERS  
> > 
> > Can you help me understand why this is only available if
> > CONFIG_HIST_TRIGGERS is selected in the kernel config?  AFAICT this
> > code doesn't depend on the histogram code, and the run-time selection
> > of triggers is usable without CONFIG_HIST_TRIGGERS.
> 
> Good catch!
> 
> I got confused, and only saw that as "CONFIG_TRIGGERS" and wasn't thinking
> that config was just for histogram triggers :-p
> 
> Care to send a patch to fix it?

Sure, happy to.

One more question: I was playing with this code using examples from 

https://www.kernel.org/doc/html/latest/trace/events.html

and when I tried to create a command line trigger to gather a snapshot:

trace_trigger="sched_switch.snapshot:1 if prev_state == 2"

it hits an oops:

[    0.178179] traps: PANIC: double fault, error_code: 0xffffffffa0a02040
[    0.178187] BUG: unable to handle page fault for address: fffffe000000b000
[    0.178188] #PF: supervisor read access in kernel mode
[    0.178189] #PF: error_code(0x0000) - not-present page
[    0.178191] PGD 23ffc3067 P4D 23ffc3067 PUD 23ffc1067 PMD 23ffc0067 PTE 0
[    0.178193] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.178195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc7+ #4
[    0.178197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[    0.178198] RIP: 0010:__die_header+0x1e/0x7d
[    0.178204] Code: 00 00 31 c0 89 43 50 e9 30 ff 49 ff 0f 1f 44 00 00 8b 05 bf 3c 64 01 49 89 fa 85 c0 75 0e 48 c7 c7 60 f0 bc a1 b9 2a 00 01
[    0.178206] RSP: 0000:fffffe000000aec0 EFLAGS: 00010046
[    0.178207] RAX: 0000000000000000 RBX: ffffffffa0f5aaa8 RCX: 0000000000000002
[    0.178208] RDX: ffffffffa0a02040 RSI: fffffe000000b000 RDI: ffffffffa1bcf100
[    0.178209] RBP: fffffe000000af60 R08: ffffffffa125b280 R09: 6666666666666666
[    0.178209] R10: ffffffffa0f5aaa8 R11: 203a65646f635f72 R12: ffffffffa0a02040
[    0.178210] R13: ffffffffa0a02040 R14: 0000000000000000 R15: 0000000000000000
[    0.178212] FS:  0000000000000000(0000) GS:ffff93ef77c00000(0000) knlGS:0000000000000000
[    0.178213] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.178214] CR2: fffffe000000b000 CR3: 000000004500a000 CR4: 00000000000006b0
[    0.178217] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.178218] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.178219] Call Trace:
[    0.178223]  <#DF>
[    0.178224]  __die+0x17/0x29
[    0.178227]  die+0x2a/0x50
[    0.178232]  ? zen_untrain_ret+0x1/0x1
[    0.178237]  exc_double_fault+0x173/0x180
[    0.178242]  asm_exc_double_fault+0x1f/0x30
[    0.178248] WARNING: stack recursion on stack type 5
[    0.178249] WARNING: can't access registers at asm_exc_double_fault+0x1f/0x30
[    0.178253]  </#DF>
[    0.178253]  <TASK>
[    0.178254]  ? _raw_spin_unlock_irq+0x11/0x40
[    0.178256]  ? wait_for_completion+0x7e/0x160
[    0.178258]  ? ring_buffer_resize+0x320/0x450
[    0.178262]  ? resize_buffer_duplicate_size+0x38/0xe0
[    0.178264]  ? tracing_alloc_snapshot_instance+0x23/0x40
[    0.178266]  ? register_snapshot_trigger+0x16/0x40
[    0.178269]  ? event_trigger_parse+0x113/0x160
[    0.178272]  ? trigger_process_regex+0xb8/0x100
[    0.178274]  ? __trace_early_add_events+0xb8/0x140
[    0.178275]  ? trace_event_init+0xcc/0x2dd
[    0.178278]  ? start_kernel+0x4a9/0x713
[    0.178281]  ? secondary_startup_64_no_verify+0xce/0xdb
[    0.178285]  </TASK>
[    0.178285] Modules linked in:
[    0.178286] CR2: fffffe000000b000
[    0.220408] ---[ end trace 0000000000000000 ]---
[    0.220409] RIP: 0010:__die_header+0x1e/0x7d
[    0.220411] Code: 00 00 31 c0 89 43 50 e9 30 ff 49 ff 0f 1f 44 00 00 8b 05 bf 3c 64 01 49 89 fa 85 c0 75 0e 48 c7 c7 60 f0 bc a1 b9 2a 00 01
[    0.220412] RSP: 0000:fffffe000000aec0 EFLAGS: 00010046
[    0.220414] RAX: 0000000000000000 RBX: ffffffffa0f5aaa8 RCX: 0000000000000002
[    0.220414] RDX: ffffffffa0a02040 RSI: fffffe000000b000 RDI: ffffffffa1bcf100
[    0.220415] RBP: fffffe000000af60 R08: ffffffffa125b280 R09: 6666666666666666
[    0.220416] R10: ffffffffa0f5aaa8 R11: 203a65646f635f72 R12: ffffffffa0a02040
[    0.220416] R13: ffffffffa0a02040 R14: 0000000000000000 R15: 0000000000000000
[    0.220417] FS:  0000000000000000(0000) GS:ffff93ef77c00000(0000) knlGS:0000000000000000
[    0.220418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.220419] CR2: fffffe000000b000 CR3: 000000004500a000 CR4: 00000000000006b0
[    0.220421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.220422] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.220423] Kernel panic - not syncing: Fatal exception in interrupt

I think that this is because the code to allocate the snapshot buffer uses
workqueues (ring_buffer_resize() calls schedule_work_on() then
wait_for_completion()), but at this point during the init process the
workqueues are up enough that we can enqueue entries, but they are not yet
doing work.

start_kernel() {
...

  /*
   * Allow workqueue creation and work item queueing/cancelling
   * early.  Work item execution depends on kthreads and starts after
   * workqueue_init().
   */
  workqueue_init_early();
  ...

  /* Trace events are available after this */
  trace_init();  // here is where we try and allocate the snapshot

  ...

  arch_call_rest_init();
    rest_init()
      kernel_init()
        kernel_init_freeable()
          workqueue_init()
}

I'm guessing the best we can do here is just disallow snapshot triggers via
the command line option, so that others don't cut themselves on this sharp
corner?  Other ideas?

Thanks,
- Ross