lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211022083845.08fe5754@gandalf.local.home>
Date:   Fri, 22 Oct 2021 08:38:45 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Gavin Shan <gshan@...hat.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [BUG] WARNING: CPU: 3 PID: 1 at mm/debug_vm_pgtable.c:493

On Tue, 12 Oct 2021 12:15:40 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -706,12 +706,16 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  
>  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
>  	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> -	    (mtrr != MTRR_TYPE_WRBACK))
> +	    (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_debug("mtrr_type_lookup() returned %d (%d)\n", mtrr, uniform);
>  		return 0;
> +	}
>  
>  	/* Bail out if we are we on a populated non-leaf entry: */
> -	if (pud_present(*pud) && !pud_huge(*pud))
> +	if (pud_present(*pud) && !pud_huge(*pud)) {
> +		pr_debug("pud is already present (%lx)\n", (unsigned long)pud_val(*pud));
>  		return 0;
> +	}
>  

It finally triggered again. And this time with this patch applied. But I
don't see the added printks anywhere in the dmesg.

Full dmesg is here:

  https://rostedt.org/private/dmesg-debug_vm_pgtable-20211022

Unfortunately I lost the config, but can recreated it when my tests finish.
(I kicked it off again so that I can post these patches to linux-next)
But I did share a config that triggered this in the past:

  https://lore.kernel.org/all/20211012141131.3c9a2eb1@gandalf.local.home/

The tree I'm testing is:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
  branch: ftrace/core

But this is something that has been triggering since 5.14.

Now there's a lot of debugging that is happening.

Here's the first splat:

[  178.714431] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[  178.723726] ------------[ cut here ]------------
[  178.728389] WARNING: CPU: 2 PID: 1 at mm/debug_vm_pgtable.c:492 pud_huge_tests+0x42/0x68
[  178.736494] Modules linked in:
[  178.739565] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc3-test+ #79
[  178.746452] Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
[  178.754112] RIP: 0010:pud_huge_tests+0x42/0x68
[  178.758567] Code: 48 8b 47 20 48 89 fb 48 c7 00 00 00 00 00 48 8b b7 a0 00 00 00 48 8b 57 60 48 8b 7f 20 48 c1 e6 0c e8 ca 2b d5 fd 85 c0 75 02 <0f> 0b 48 8b 7b 20 e8 92 2d d5 fd 85 c0 75 02 0f 0b 48 8b 43 20 48
[  178.777323] RSP: 0000:ffffaadf40033d70 EFLAGS: 00010246
[  178.782566] RAX: 0000000000000000 RBX: ffffaadf40033d88 RCX: 6fd4a5ea5b1e4400
[  178.789706] RDX: 00ffffff8411f9a3 RSI: ffffaadf40033cf8 RDI: ffffaadf40033cf9
[  178.796846] RBP: ffffaadf40033d78 R08: 00000000dc000000 R09: 0000000000040000
[  178.803992] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  178.811129] R13: ffff947400a48938 R14: ffff9474009afde8 R15: 0000000000000000
[  178.818273] FS:  0000000000000000(0000) GS:ffff947516800000(0000) knlGS:0000000000000000
[  178.826367] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.832121] CR2: 0000000000000000 CR3: 00000000cae2a002 CR4: 00000000001706e0
[  178.839261] Call Trace:
[  178.841725]  debug_vm_pgtable+0x3a2/0x50b
[  178.845765]  ? pgd_clear_tests+0x61/0x61
[  178.849700]  do_one_initcall+0xe8/0x25c
[  178.853556]  ? lock_is_held+0xc/0xe
[  178.857055]  ? rcu_read_lock_sched_held+0x3b/0x72
[  178.861775]  do_initcalls+0xcd/0xed
[  178.865282]  kernel_init_freeable+0x183/0x1ba
[  178.869653]  ? rest_init+0x155/0x155
[  178.873248]  kernel_init+0x1a/0x11a
[  178.876754]  ret_from_fork+0x22/0x30
[  178.880360] irq event stamp: 29103539
[  178.884035] hardirqs last  enabled at (29103549): [<ffffffff8412e491>] __up_console_sem+0x4b/0x4f
[  178.892910] hardirqs last disabled at (29103558): [<ffffffff8412e471>] __up_console_sem+0x2b/0x4f
[  178.901782] softirqs last  enabled at (29103506): [<ffffffff85400328>] __do_softirq+0x328/0x363
[  178.910485] softirqs last disabled at (29103501): [<ffffffff840d211e>] __irq_exit_rcu+0x60/0x9c
[  178.919187] ---[ end trace 328fd4bcdb7a033d ]---

It did trigger right after the kprobe test. Could be a hint. As it does seem
to only happen on configs with a lot of debugging enabled. But it doesn't
always happen there. I remember it happening usually around testing the
tracers.

Maybe it's a race between the initcall debug_vm_pgtable happens while some
internal tests are going on?

The tests do trigger "text_poke" which will muck with the page tables. Not
sure if that has anything to do with this.

I need to update my tests to just save off all failures and configs, so I
can go back to them. Just that my tests fail so often I'd fill up my hard
drive ;-)

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ