lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57DC1CE0.5070400@linux.intel.com>
Date:   Fri, 16 Sep 2016 09:25:04 -0700
From:   Dave Hansen <dave.hansen@...ux.intel.com>
To:     Rui Teng <rui.teng@...ux.vnet.ibm.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Michal Hocko <mhocko@...e.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Santhosh G <santhog4@...ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on
 dissolve_free_huge_pages()

On 09/16/2016 06:58 AM, Rui Teng wrote:
> On 9/15/16 12:37 AM, Dave Hansen wrote:
>> On 09/14/2016 09:33 AM, Rui Teng wrote:
>> But, as far as describing the initial problem, can you explain how the
>> tail pages still ended up being PageHuge()?  Seems like dissolving the
>> huge page should have cleared that.
>>
> I use the scripts of tools/testing/selftests/memory-hotplug/mem-on-
> off-test.sh to test and reproduce this bug. And I printed the pfn range
> on dissolve_free_huge_pages(). The sizes of the pfn range are always
> 4096, and the ranges are separated.
> [   72.362427] start_pfn: 204800, end_pfn: 208896
> [   72.371677] start_pfn: 2162688, end_pfn: 2166784
> [   72.373945] start_pfn: 217088, end_pfn: 221184
> [   72.383218] start_pfn: 2170880, end_pfn: 2174976
> [   72.385918] start_pfn: 2306048, end_pfn: 2310144
> [   72.388254] start_pfn: 2326528, end_pfn: 2330624
> 
> Sometimes, it will report a failure:
> [   72.371690] memory offlining [mem 0x2100000000-0x210fffffff] failed
> 
> And sometimes, it will report following:
> [   72.373956] Offlined Pages 4096
> 
> Whether the start_pfn and end_pfn of dissolve_free_huge_pages could be
> *random*? If so, the range may not include any page head and start from
> tail page, right?

That's an interesting data point, but it still doesn't quite explain
what is going on.

It seems like there might be parts of gigantic pages that have
PageHuge() set on tail pages, while other parts don't.  If that's true,
we have another bug and your patch just papers over the issue.

I think you really need to find the root cause before we apply this patch.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ