lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200811220643.GB39857@lca.pw>
Date:   Tue, 11 Aug 2020 18:06:44 -0400
From:   Qian Cai <cai@....pw>
To:     Naoya Horiguchi <nao.horiguchi@...il.com>
Cc:     HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
        "osalvador@...e.de" <osalvador@...e.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "david@...hat.com" <david@...hat.com>,
        "aneesh.kumar@...ux.vnet.ibm.com" <aneesh.kumar@...ux.vnet.ibm.com>,
        "zeil@...dex-team.ru" <zeil@...dex-team.ru>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>
Subject: Re: [PATCH v6 00/12] HWPOISON: soft offline rework

On Wed, Aug 12, 2020 at 04:32:01AM +0900, Naoya Horiguchi wrote:
> On Tue, Aug 11, 2020 at 01:39:24PM -0400, Qian Cai wrote:
> > On Tue, Aug 11, 2020 at 03:11:40AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > I'm still not sure why the test succeeded by reverting these because
> > > current mainline kernel provides similar mechanism to prevent reuse of
> > > soft offlined page. So this success seems to me something suspicious.
> > > 
> > > To investigate more, I want to have additional info about the page states
> > > of the relevant pages after soft offlining.  Could you collect it by the
> > > following steps?
> > > 
> > >   - modify random.c not to run hotplug_memory() in migrate_huge_hotplug_memory(),
> > >   - compile it and run "./random 1" once,
> > >   - to collect page state with hwpoisoned pages, run "./page-types -Nlr -b hwpoison",
> > >     where page-types is available under tools/vm in kernel source tree.
> > >   - choose a few pfns of soft offlined pages from kernel message
> > >     "Soft offlining pfn ...", and run "./page-types -Nlr -a <pfn>".
> > 
> > # ./page-types -Nlr -b hwpoison
> > offset	len	flags
> > 99a000	1	__________B________X_______________________
> > 99c000	1	__________B________X_______________________
> > 99e000	1	__________B________X_______________________
> > 9a0000	1	__________B________X_______________________
> > ba6000	1	__________B________X_______________________
> > baa000	1	__________B________X_______________________
> 
> Thank you.  It only shows 6 lines of records, which is unexpected to me
> because random.c iterates soft offline 2 hugepages with madvise() 1000 times.
> Somehow (maybe in arch specific way?) other hwpoisoned pages might be cleared?
> If they really are, the success of this test is a fake, and this patchset
> can be considered as a fix.

The test was designed to catch a previous bug (the latest patchset fixed that)
where kernel will be enterting into an endless loop.

https://lore.kernel.org/lkml/1570829564.5937.36.camel@lca.pw/

However, I don't understand why mmap() does not return ENOMEM in the first
place where overcommit_memory == 0 instead of munmap() or/and madvise()
returning ENOMEM. I suppose that is the price to pay with heuristic, and I
can't easily confirm if it is related to this patchset or not.

		addr = mmap(NULL, length, PROT_READ | PROT_WRITE,
			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
		if (addr == MAP_FAILED) {
			if (i == 0 || errno != ENOMEM) {
				perror("mmap");
				return 1;
			}
			usleep(1000);
			continue;
		}
		memset(addr, 0, length);

		code = madvise(addr, length, MADV_SOFT_OFFLINE);
		if(safe_munmap(addr, length))
			return 1;

		/* madvise() could return >= 0 on success. */
		if (code < 0 && errno != EBUSY) {
			perror("madvise");
			return 1;
		}

Otherwise, our test will keep running and ignore ENOMEM correctly. I did also
confirm that this patchset has a higher success rate of soft-offlining
("page-types" shows 400+ lines) which changes the existing assumption (looks
like in a good way in this case).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ