lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7e642622-72ee-87f6-ceb0-890ce9c28382@linux.vnet.ibm.com>
Date:   Tue, 20 Sep 2016 22:45:12 +0800
From:   Rui Teng <rui.teng@...ux.vnet.ibm.com>
To:     Dave Hansen <dave.hansen@...ux.intel.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Michal Hocko <mhocko@...e.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Santhosh G <santhog4@...ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on
 dissolve_free_huge_pages()

On 9/17/16 12:25 AM, Dave Hansen wrote:
>
> That's an interesting data point, but it still doesn't quite explain
> what is going on.
>
> It seems like there might be parts of gigantic pages that have
> PageHuge() set on tail pages, while other parts don't.  If that's true,
> we have another bug and your patch just papers over the issue.
>
> I think you really need to find the root cause before we apply this patch.
>
The root cause is the test scripts(tools/testing/selftests/memory-
hotplug/mem-on-off-test.sh) changes online/offline status on memory
blocks other than page header. It will *randomly* select 10% memory
blocks from /sys/devices/system/memory/memory*, and change their
online/offline status.

On my system, the memory block size is 0x10000000:
	[root@...is-n01-kvm memory]# cat block_size_bytes
	10000000

But the huge page size(16G) is more than this memory block size. So one
huge page is composed by several memory blocks. For example, memory704,
memory705, memory706 and so on. Then memory704 will contain a head
page, but memory705 will *only* contain tail pages. So the problem will
happened on it, if we call:
	#echo offline > memory705/state

That's why we need a PageHead() check now, and why this problem does
not happened on systems with smaller huge page such as 16M.

As far as the PageHuge() set, I think PageHuge() will return true for
all tail pages. Because it will get the compound_head for tail page,
and then get its huge page flag.
	page = compound_head(page);

And as far as the failure message, if one memory block is in use, it
will return failure when offline it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ