linux-kernel - Re: [PATCH v2] x86,mm: print likely CPU at segfault time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YuzsJfHi+qV6Z16E@zn.tnic>
Date:   Fri, 5 Aug 2022 12:08:37 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Rik van Riel <riel@...riel.com>
Cc:     Dave Hansen <dave.hansen@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com,
        Thomas Gleixner <tglx@...utronix.de>, Dave Jones <dsj@...com>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v2]  x86,mm: print likely CPU at segfault time

On Thu, Aug 04, 2022 at 03:54:50PM -0400, Rik van Riel wrote:
> Add a printk() to show_signal_msg() to print the CPU, core, and socket
> at segfault time. This is not perfect, since the task might get rescheduled
> on another CPU between when the fault hit, and when the message is printed,
> but in practice this has been good enough to help us identify several bad
> CPU cores.
> 
> segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0 (core 0, socket 0)

And what happens when someone is looking at this, the CPU information is
wrong because we got rescheduled but...

> 
> Signed-off-by: Rik van Riel <riel@...riel.com>
> CC: Dave Jones <dsj@...com>
> ---
>  arch/x86/mm/fault.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fad8faa29d04..a9b93a7816f9 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
>  		unsigned long address, struct task_struct *tsk)
>  {
>  	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> +	/* This is a racy snapshot, but it's better than nothing. */

... someone is missing this important tidbit here that the CPU info
above is unreliable?

Someone is sent on a wild goose chase.

Can't you read out the CPU number before interrupts are enabled and hand
it down for printing?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette