lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGM2reYtCb2_czEt1M8KhFz+YxGq=iSGnTskC=FGNM4kXOiS5g@mail.gmail.com>
Date:   Sat, 7 Apr 2018 10:45:03 -0400
From:   Pavel Tatashin <pasha.tatashin@...cle.com>
To:     Sasha Levin <Alexander.Levin@...rosoft.com>
Cc:     "steven.sistare@...cle.com" <steven.sistare@...cle.com>,
        "daniel.m.jordan@...cle.com" <daniel.m.jordan@...cle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "bharata@...ux.vnet.ibm.com" <bharata@...ux.vnet.ibm.com>
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

> Let me study your trace, perhaps I will able to figure out the issue
> without reproducing it.

Hi Sasha,

I've been studying this problem more. The issue happens in this stack:

...subsys_init...
 topology_init()
  register_one_node(nid)
   link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages)
    register_mem_sect_under_node(mem_blk, nid)
     get_nid_for_pfn(pfn)
      pfn_to_nid(pfn)
       page_to_nid(page)
        PF_POISONED_CHECK(page)

We are trying to get nid from struct page which has not been
initialized.  My patches add this extra scrutiny to make sure that we
never get invalid nid from a "struct page" by adding
PF_POISONED_CHECK() to page_to_nid(). So, the bug already exists in
Linux where incorrect nid is read. The question is why does happen?

First, I thought, that perhaps struct page is not yet initialized.
But, the initcalls are done after deferred pages are initialized, and
thus every struct page must be initialized by now. Also, if deferred
pages were enabled, we would take a slightly different path and avoid
this bug by getting nid from memblock instead of struct page:

get_nid_for_pfn(pfn)
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 if (system_state < SYSTEM_RUNNING)
  return early_pfn_to_nid(pfn);
#endif

I also verified in your config that CONFIG_DEFERRED_STRUCT_PAGE_INIT
is not set. So, one way to fix this issue, is to remove this "#ifdef"
(I have not checked for dependancies), but this is simply addressing
symptom, not fixing the actual issue.

Thus, we have a "struct page" backing memory for this pfn, but we have
not initialized it. For some reason memmap_init_zone() decided to skip
it, and I am not sure why. Looking at the code we skip initializing
if:
!early_pfn_valid(pfn)) aka !pfn_valid(pfn) and if !early_pfn_in_nid(pfn, nid).

I suspect, this has something to do with !pfn_valid(pfn). But, without
having a machine on which I could reproduce this problem, I cannot
study it further to determine exactly why pfn is not valid.

Please replace !pfn_valid_within() with !pfn_valid() in
get_nid_for_pfn() and see if problem still happens. If it does not
happen, lets study the memory map, pgdata's start end, and the value
of this pfn.

Thank you,
Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ