linux-kernel - Re: [BUG] WARNING: CPU: 3 PID: 1 at mm/debug_vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <61262547-b9ad-7041-18e2-75840b5d784d@arm.com>
Date:   Mon, 22 Nov 2021 12:01:47 +0530
From:   Anshuman Khandual <anshuman.khandual@....com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Gavin Shan <gshan@...hat.com>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [BUG] WARNING: CPU: 3 PID: 1 at mm/debug_vm_pgtable.c:493



On 11/19/21 12:03 AM, Linus Torvalds wrote:
> On Thu, Nov 18, 2021 at 8:47 AM Steven Rostedt <rostedt@...dmis.org> wrote:
>> Triggered it again with the new update:
>>
>> [   24.751779] IPI shorthand broadcast: enabled
>> [   24.761177] sched_clock: Marking stable (23431856262, 1329270511)->(28163092341, -3401965568)
>> [   24.770495] device: 'cpu_dma_latency': device_add
>> [   24.775232] PM: Adding info for No Bus:cpu_dma_latency
>> [   24.780929] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>> [   24.799490] mtrr_type_lookup() returned 0 (0)
> Ok, so that's MTRR_TYPE_UNCACHABLE, and "uniform" is 0.
> 
> Anyway, either the mtrr code is confused, or more likely it just does
> the right thing, and  pud_set_huge() is simply expected to return 0 in
> this situation, and that WARN_ON() in pud_huge_tests() is simply wrong
> to trigger at all.
> 
> I didn't look at what all the code in debug_vm_pgtable() is trying to
> set up to test. Honestly, it's all very opaque.
> 
> But I do notice that the pfn that the test uses ends up basically
> being something random, where the "fixed" pfn is
> 
>         phys = __pa_symbol(&start_kernel);
>         ...
>         args->fixed_pud_pfn = __phys_to_pfn(phys & PUD_MASK);
> 
> rather than being an allocated real PUD-sized page. That can be a
> problem in itself.
> 
> So I think the problem is that depending on where the kernel is
> allocated, the fixed_pud_pfn ends up being in an area with MTRR
> settings. In fact, I'm surprised it's not *always* in that area, since
> presumabl;y you have the normal fixed MTRR issues with the 640k-1M
> range.
> 
> But I didn't look - probably the MTRR code doesn't actually check the
> special fixed MTRR's.
> 
> Anyway, I think that the end result is simply that the tests in
> mm/debug_vm_pgtable.c are simply buggy, and the WARN_ON() is not a
> sign of anything wrong in the mm, but with the tests themselves.
> 
> So the fixed_pud_pfn is dodgy, but it looks like the non-fixed
> 'pud_pfn' allocation may be dodgy too:
> 
>   #ifdef CONFIG_CONTIG_ALLOC
>         if (order >= MAX_ORDER) {
>                 page = alloc_contig_pages((1 << order), GFP_KERNEL,
>                                           first_online_node, NULL);
> 
> because afaik, alloc_contig_pages() does allocate a contiguous region,
> but it doesn't necessarily allocate a _aligned_ contiguous region.
> 
> So I think _all_ those PUD tests are likely broken, but honestly, I
> don't know the code well enough to be entirely sure, I'm just seeing
> code that looks dodgy to me.
> 
> I don't think the breakage is x86-specific. Quite the reverse. I think
> the x86 code just happens to randomly show it when some MTRR ends up
> being used.
> 
> Maybe pfn_pud() should verify that it's actually given an aligned argument?
> 
> Gavin, Anshuman? Feel free to tell me what I missed.

Hi Linus,

These PUD tests have been subtle (including their problems as seen here
in this report) on certain platforms. I will definitely take a detailed
look, but probably after an week (leave, travel etc). Thank you.

- Anshuman