linux-kernel - Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aa2d3299-a064-7118-ea10-b05279238b96@microsoft.com>
Date:   Tue, 4 Sep 2018 21:44:47 +0000
From:   Pasha Tatashin <Pavel.Tatashin@...rosoft.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        "Duyck, Alexander H" <alexander.h.duyck@...el.com>,
        Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM
 to CONFIG_DEBUG_VM_PGFLAGS



On 9/4/18 5:13 PM, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 1:07 PM Pasha Tatashin
> <Pavel.Tatashin@...rosoft.com> wrote:
>>
>> Hi Alexander,
>>
>> This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not
>> initialize allocated memory, and by setting memory to all ones in debug
>> build we ensure that no callers rely on this function to return zeroed
>> memory just by accident.
> 
> I get that, but setting this to all 1's is still just debugging code
> and that is adding significant overhead.

That's correct debugging code on debugging kernel.

> 
>> And, the accidents are frequent because most of the BIOSes and
>> hypervisors zero memory for us. The exception is kexec reboot.
>>
>> So, the fact that page flags checks this pattern, does not mean that
>> this is the only user. Memory that is returned by
>> memblock_virt_alloc_try_nid_raw() is used for page table as well, and
>> can be used in other places as well that don't want memblock to zero the
>> memory for them for performance reasons.
> 
> The logic behind this statement is confusing. You are saying they
> don't want memblock to zero the memory for performance reasons, yet
> you are setting it to all 1's for debugging reasons? I get that it is
> wrapped, but in my mind just using CONFIG_DEBUG_VM is too broad of a
> brush. Especially with distros like Fedora enabling it by default.

The idea is not to zero memory on production kernel, and ensure that not
zeroing memory does not cause any accidental bugs by having debug code
on debug kernel.

> 
>> I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if
>> so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK
>>
>> Thank you,
>> Pavel
> 
> I don't know about production. I am running a Fedora kernel on my
> development system and it has it enabled. It looks like it has been
> that way for a while based on a FC20 Bugzilla
> (https://bugzilla.redhat.com/show_bug.cgi?id=1074710). A quick look at
> one of my CentOS systems shows that it doesn't have it set. I suspect
> it will vary from distro to distro. I just know it spooked me when I
> was stuck staring at a blank screen for three minutes when I was
> booting a system with 12TB of memory since this delay can hit you
> early in the boot.

I understand, this is the delay that I fixed when I removed memset(0)
from struct page initialization code. However, we still need to keep
this debug code memset(1) in order to catch some bugs. And we do from
time to time.

For far too long linux was expecting that the memory that is returned by
memblock and boot allocator is always zeroed.

> 
> I had considered adding a completely new CONFIG. The only thing is it
> doesn't make much sense to have the logic setting the value to all 1's
> without any logic to test for it. 

When memory is zeroed, page table works by accident as the entries are
empty. However, when entries are all ones, and we accidentally try to
use that memory as page table invalid VA in page table will crash debug
kernel (and it has in the past helped finding some bugs).

So, the testing is not only that uninitialized struct pages are not
accessed, but also that only explicitly initialized page tables are
accessed.


That is why I thought it made more
> sense to just fold it into CONFIG_DEBUG_VM_PGFLAGS. I suppose I could
> look at something like CONFIG_DEBUG_PAGE_INIT if we want to go that
> route. I figure using something like MEMBLOCK probably wouldn't make
> sense since this also impacts sparse section init.

If distros are using CONFIG_DEBUG_VM in production kernels (as you
pointed out above), it makes sense to add CONFIG_DEBUG_MEMBLOCK.

Thank you,
Pavel