lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20221209002932.2e9e89e2@gandalf.local.home>
Date:   Fri, 9 Dec 2022 00:29:32 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Ross Zwisler <zwisler@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        paulmck@...nel.org, Joel Fernandes <joel@...lfernandes.org>,
        Tom Zanussi <zanussi@...nel.org>
Subject: Re: [PATCH v2] tracing: Add trace_trigger kernel command line
 option

On Thu, 8 Dec 2022 17:19:21 -0700
Ross Zwisler <zwisler@...gle.com> wrote:

> I think that this is because the code to allocate the snapshot buffer uses
> workqueues (ring_buffer_resize() calls schedule_work_on() then
> wait_for_completion()), but at this point during the init process the
> workqueues are up enough that we can enqueue entries, but they are not yet
> doing work.

Thanks for the report.
> 
> start_kernel() {
> ...
> 
>   /*
>    * Allow workqueue creation and work item queueing/cancelling
>    * early.  Work item execution depends on kthreads and starts after
>    * workqueue_init().
>    */
>   workqueue_init_early();
>   ...
> 
>   /* Trace events are available after this */
>   trace_init();  // here is where we try and allocate the snapshot
> 
>   ...
> 
>   arch_call_rest_init();
>     rest_init()
>       kernel_init()
>         kernel_init_freeable()
>           workqueue_init()
> }
> 
> I'm guessing the best we can do here is just disallow snapshot triggers via
> the command line option, so that others don't cut themselves on this sharp
> corner?  Other ideas?

After debugging it, it's because that code path expects to be called with
interrupts enabled, but this early in boot up, interrupts haven't been
enabled yet and should not be.

> [    0.178253]  </#DF>
> [    0.178253]  <TASK>
> [    0.178254]  ? _raw_spin_unlock_irq+0x11/0x40

raw_spin_lock_irq() enables interrupts regardless.


> [    0.178256]  ? wait_for_completion+0x7e/0x160
> [    0.178258]  ? ring_buffer_resize+0x320/0x450
> [    0.178262]  ? resize_buffer_duplicate_size+0x38/0xe0
> [    0.178264]  ? tracing_alloc_snapshot_instance+0x23/0x40
> [    0.178266]  ? register_snapshot_trigger+0x16/0x40
> [    0.178269]  ? event_trigger_parse+0x113/0x160
> [    0.178272]  ? trigger_process_regex+0xb8/0x100
> [    0.178274]  ? __trace_early_add_events+0xb8/0x140
> [    0.178275]  ? trace_event_init+0xcc/0x2dd
> [    0.178278]  ? start_kernel+0x4a9/0x713
> [    0.178281]  ? secondary_startup_64_no_verify+0xce/0xdb

This appears to fix it.

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 843818ee4814..ef4da331ff61 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2062,8 +2062,10 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct list_head *pages = &cpu_buffer->new_pages;
 	int retries, success;
+	unsigned long flags;
 
-	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/* Can be called at early boot up, where interrupts have not been enabled */
+	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 	/*
 	 * We are holding the reader lock, so the reader page won't be swapped
 	 * in the ring buffer. Now we are racing with the writer trying to
@@ -2120,7 +2122,7 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	 * tracing
 	 */
 	RB_WARN_ON(cpu_buffer, !success);
-	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	/* free pages if they weren't inserted */
 	if (!success) {
@@ -2171,6 +2173,7 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned long nr_pages;
+	unsigned long flags;
 	int cpu, err;
 
 	/*
@@ -2248,8 +2251,19 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 				rb_update_pages(cpu_buffer);
 				cpu_buffer->nr_pages_to_update = 0;
 			} else {
-				schedule_work_on(cpu,
-						&cpu_buffer->update_pages_work);
+				/*
+				 * Run directly if possible.
+				 * Save flags as this can be called at early boot up.
+				 */
+				local_irq_save(flags);
+				if (cpu != smp_processor_id()) {
+					local_irq_restore(flags);
+					schedule_work_on(cpu,
+							 &cpu_buffer->update_pages_work);
+				} else {
+					update_pages_handler(&cpu_buffer->update_pages_work);
+					local_irq_restore(flags);
+				}
 			}
 		}
 
@@ -2298,9 +2312,20 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 		if (!cpu_online(cpu_id))
 			rb_update_pages(cpu_buffer);
 		else {
-			schedule_work_on(cpu_id,
-					 &cpu_buffer->update_pages_work);
-			wait_for_completion(&cpu_buffer->update_done);
+			/*
+			 * Run directly if possible.
+			 * Save flags as this can be called at early boot up.
+			 */
+			local_irq_save(flags);
+			if (cpu_id == smp_processor_id()) {
+				update_pages_handler(&cpu_buffer->update_pages_work);
+				local_irq_restore(flags);
+			} else {
+				local_irq_restore(flags);
+				schedule_work_on(cpu_id,
+						 &cpu_buffer->update_pages_work);
+				wait_for_completion(&cpu_buffer->update_done);
+			}
 		}
 
 		cpu_buffer->nr_pages_to_update = 0;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ