lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170913083233.GA7659@gmail.com>
Date:   Wed, 13 Sep 2017 10:32:34 +0200
From:   Alexandru Moise <00moses.alexander00@...il.com>
To:     Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "khandual@...ux.vnet.ibm.com" <khandual@...ux.vnet.ibm.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "aarcange@...hat.com" <aarcange@...hat.com>,
        "minchan@...nel.org" <minchan@...nel.org>,
        "hillf.zj@...baba-inc.com" <hillf.zj@...baba-inc.com>,
        "shli@...com" <shli@...com>,
        "rppt@...ux.vnet.ibm.com" <rppt@...ux.vnet.ibm.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "riel@...hat.com" <riel@...hat.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] mm, hugetlb, soft_offline: save compound page order
 before page migration

On Wed, Sep 13, 2017 at 12:13:09AM +0000, Naoya Horiguchi wrote:
> Hi Alexandru,
> 
> On Tue, Sep 12, 2017 at 10:43:06PM +0200, Alexandru Moise wrote:
> > This fixes a bug in madvise() where if you'd try to soft offline a
> > hugepage via madvise(), while walking the address range you'd end up,
> > using the wrong page offset due to attempting to get the compound
> > order of a former but presently not compound page, due to dissolving
> > the huge page (since c3114a8).
> > 
> > Signed-off-by: Alexandru Moise <00moses.alexander00@...il.com>
> 
> There was a similar discussion in https://marc.info/?l=linux-kernel&m=150354919510631&w=2
> over thp. As I stated there, if we give multi-page range into the parameters
> [start, end), we expect that memory errors are injected to every single page
> within the range. 

At the moment we'll end up offlining the i'th subpage of the newly migrated page with
each itteration. That's why I end up without free pages in hugetlbfs.

With this patch we migrate the hugepage, offline 1 subpage and dissolve the rest,
which is closer to how mcelog should behave, mcelog will usually try to offline random
spots within a hugepage, not offline a whole hugepage at once, which doesn't make
sense as you usually just get 1-2 stuck bits on your DIMM. The whole point of soft
offlining is as a preventive measure against large number of correctable memory
errors on a particular page.

I agree that if we give a range we should expect all the subpages to be offlined
although I don't know what value that would add.

> 
> So I start to feel that we should revert the following patch which introduced
> the multi-page stepping.
> 
>    commit 20cb6cab52a21b46e3c0dc7bd23f004f810fb421
>    Author: Wanpeng Li <liwanp@...ux.vnet.ibm.com>
>    Date:   Mon Sep 30 13:45:21 2013 -0700
>    
>        mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
> 
> In order to suppress the printk flood, we can use ratelimit mechanism, or
> just s/pr_info/pr_debug/ might be ok.

I'd rather keep the printouts, it's not really that much of a hot path, if
they went on forever sure, but if you manually offline 512 pages you should expect
512 printouts. It's nice to see exactly which PFNs get offlined as well.

../Alex

> 
> Thanks,
> Naoya Horiguchi
> 
> > ---
> >  mm/madvise.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 21261ff0466f..25bade36e9ca 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -625,18 +625,26 @@ static int madvise_inject_error(int behavior,
> >  {
> >  	struct page *page;
> >  	struct zone *zone;
> > +	unsigned int order;
> >  
> >  	if (!capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  
> > -	for (; start < end; start += PAGE_SIZE <<
> > -				compound_order(compound_head(page))) {
> > +
> > +	for (; start < end; start += PAGE_SIZE << order) {
> >  		int ret;
> >  
> >  		ret = get_user_pages_fast(start, 1, 0, &page);
> >  		if (ret != 1)
> >  			return ret;
> >  
> > +		/*
> > +		 * When soft offlining hugepages, after migrating the page
> > +		 * we dissolve it, therefore in the second loop "page" will
> > +		 * no longer be a compound page, and order will be 0.
> > +		 */
> > +		order = compound_order(compound_head(page));
> > +
> >  		if (PageHWPoison(page)) {
> >  			put_page(page);
> >  			continue;
> > -- 
> > 2.14.1
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@...ck.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ