linux-kernel - Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150630115353.GB6812@suse.de>
Date:	Tue, 30 Jun 2015 12:53:53 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Xishi Qiu <qiuxishi@...wei.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Luck, Tony" <tony.luck@...el.com>,
	Hanjun Guo <guohanjun@...wei.com>,
	Xiexiuqi <xiexiuqi@...wei.com>, leon@...n.nu,
	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Linux MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy
 allocations

On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@...e.de> wrote:
> 
> > [...]
> > 
> > Basically, overall I feel this series is the wrong approach but not knowing who 
> > the users are making is much harder to judge. I strongly suspect that if 
> > mirrored memory is to be properly used then it needs to be available before the 
> > page allocator is even active. Once active, there needs to be controlled access 
> > for allocation requests that are really critical to mirror and not just all 
> > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be 
> > alterations to the bootmem allocator and access to an explicit reserve that is 
> > not accounted for as "free memory" and accessed via an explicit GFP flag.
> 
> So I think the main goal is to avoid kernel crashes when a #MC memory fault 
> arrives on a piece of memory that is owned by the kernel.
> 

Sounds logical. In that case, bootmem awareness would be crucial.
Enabling support in just the page allocator is too late.

> In that sense 'protecting' all kernel allocations is natural: we don't know how to 
> recover from faults that affect kernel memory.
> 

It potentially uses all mirrored memory on memory that does not need that
sort of guarantee. For example, if there was a MC on memory backing the
inode cache then potentially that is recoverable as long as the inodes
were not dirty. That's a minor detail as the kernel could later protect
only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal
MC in kernel space could be distinguished from non-fatal checks.

Bootmem awareness is much more important either way. If that was addressed
then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that
is only used for MIGRATE_UNMOVABLE allocations and never for user-space.
That misses MIGRATE_RECLAIMABLE so if that is required then we need
something else that both preserves fragmentation avoidance and avoid
introducing loads of new migratetypes.

Reclaim-related issues could be partially avoided by forbidding use from
userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during
watermark checks.

> We do know how to recover from faults that affect user-space memory alone.
> 
> So if a mechanism is in place that prioritizes 3 groups of allocators:
> 
>   - non-recoverable memory (kernel allocations mostly)
> 

So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether
they are accounted for by zones of MIGRATE_TYPES.

>   - high priority user memory (critical apps that must never fail)
> 

This one is problematic with a MIGRATE_TYPE-based approach such as the one in
this series. If a high priority requires memory and MIGRATE_MIRROR is full
then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel
may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR
memory with no guarantee of success. It'll look like unnecessary thrashing
from userspace but difficult to diagnose as reclaim stats are per-zone based.
Dealing with this needs either a zone-based approach or a lot of surgery
to reclaim (similar to what the node-based LRU series does actually when
it skips pages when the caller requires lowmem pages).

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/