lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211203131451.j5zd42flgfyqoyxo@linutronix.de>
Date:   Fri, 3 Dec 2021 14:14:51 +0100
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        brian.murray@...onical.com
Subject: Re: [PATCH] panic: Remove oops_id.

On 2021-12-02 14:43:08 [-0800], Andrew Morton wrote:
> On Thu, 2 Dec 2021 15:27:13 +0100 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> 
> > The oops id has been added as part of the end of trace marker for the
> > kerneloops.org project. The id is used to automatically identify duplicate
> > submissions of the same report. Identical looking reports with different
> > a id can be considered as the same oops occurred again.
> > 
> > The early initialisation of the oops_id can create a warning if the
> > random core is not yet fully initialized. On PREEMPT_RT it is
> > problematic if the id is initialized on demand from non preemptible
> > context.
> 
> "problematic" isn't very useful :(
> 
> What exactly goes wrong under -rt?

Sorry, I indeed forgot that part.
get_random_bytes() acquires crng_state::lock (spinlock_t) which is a
sleeping lock on PREEMPT_RT and can not be acquired in non-preemptible
context.

> > The kernel oops project is not available since 2017.
> > Remove the oops_id.
> 
> (googles "linux oops_id")
> 
> https://wiki.ubuntu.com/UbuntuWeeklyNewsletter/Issue565#What.2BIBk-s_the_OOPS_ID.3F
> 
> Seems someone was using it in 2019.  My search was very brief.

The referenced blog entry is gone. So I can't tell what it is used for.
But I noticed whoopsie and added Brian on Cc. Brian, is the oops-id from
kernel backtrace (the numbers after "end trace") used for anything by
oopsie? The source of whoopsie has something named "oopsid" but it
appears to be something sent by the server as a response. So it could be
something different or kernel's oops-id parsed…

> The world wouldn't end if we removed this.  But perhaps it would be better
> to replace the oops id with "0" to avoid breaking parsers.
> 
> It's just a fairly unique number.  We could use anything.  Simply
> jiffies or ktime_get() would suffice?

I had a patch to get rid of the RT problem and avoid the
warn_unseeded_randomness() warning but then I was looking at it and
asking myself why keeping it…
I definitely could replace it with ktime_get() or dig the old patch
fixing it and keeping the random id if there is the need for it.

After looking at it, maybe something like a fingerprint of the warning
would make sense. You could throw it into google and find reports which
might indicate the same warning as you currently having, say something
like:

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index ea4fe192189d5..f4bbed25e4dd9 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -23,6 +23,8 @@
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
 
+void report_fp_add(const char *s, int len);
+
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
 static int die_counter;
@@ -70,6 +72,14 @@ static void printk_stack_address(unsigned long address, int reliable,
 {
 	touch_nmi_watchdog();
 	printk("%s %s%pBb\n", log_lvl, reliable ? "" : "? ", (void *)address);
+	if (reliable) {
+		char buf[64];
+		int len;
+
+		len = snprintf(buf, sizeof(buf), "%ps", (void *)address);
+		len = min(len, sizeof(buf));
+		report_fp_add(buf, len);
+	}
 }
 
 static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
diff --git a/init/main.c b/init/main.c
index bb984ed79de0e..70e2c0d6a9c94 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1495,6 +1495,7 @@ static int __ref kernel_init(void *unused)
 	 * Wait until kthreadd is all set-up.
 	 */
 	wait_for_completion(&kthreadd_done);
+	WARN_ON(1);
 
 	kernel_init_freeable();
 	/* need to finish all async __init code before freeing the memory */
@@ -1520,6 +1521,7 @@ static int __ref kernel_init(void *unused)
 	rcu_end_inkernel_boot();
 
 	do_sysctl_args();
+	WARN_ON(1);
 
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
@@ -1581,6 +1583,7 @@ static noinline void __init kernel_init_freeable(void)
 {
 	/* Now the scheduler is fully set up and can do blocking allocations */
 	gfp_allowed_mask = __GFP_BITS_MASK;
+	WARN_ON(1);
 
 	/*
 	 * init can allocate pages on any node
@@ -1614,6 +1617,7 @@ static noinline void __init kernel_init_freeable(void)
 	wait_for_initramfs();
 	console_on_rootfs();
 
+	WARN_ON(1);
 	/*
 	 * check if there is an early userspace init.  If yes, let it do all
 	 * the work
diff --git a/kernel/panic.c b/kernel/panic.c
index cefd7d82366fb..c6bd135815b0b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -533,6 +533,20 @@ void oops_enter(void)
 		trigger_all_cpu_backtrace();
 }
 
+#include <linux/jhash.h>
+
+static unsigned int fingerprint_state;
+
+static void fingerprint_state_init(void)
+{
+	fingerprint_state = 0;
+}
+
+void report_fp_add(const char *s, int len)
+{
+	fingerprint_state = jhash(s, len, fingerprint_state);
+}
+
 /*
  * 64-bit random ID for oopses:
  */
@@ -552,7 +566,8 @@ late_initcall(init_oops_id);
 static void print_oops_end_marker(void)
 {
 	init_oops_id();
-	pr_warn("---[ end trace %016llx ]---\n", (unsigned long long)oops_id);
+	pr_warn("---[ end trace %016llx fingerprint: %08x ]---\n",
+		(unsigned long long)oops_id, fingerprint_state);
 }
 
 /*
@@ -574,6 +589,7 @@ struct warn_args {
 void __warn(const char *file, int line, void *caller, unsigned taint,
 	    struct pt_regs *regs, struct warn_args *args)
 {
+	fingerprint_state_init();
 	disable_trace_on_warning();
 
 	if (file)
-- 

produces four backtraces, two with the same fingerprint.

Adding fingerprint states to lockdep might be able to produce the same
fingerprint for the same report, e.g. same two variables reported in a
circular locking dependency.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ