linux-kernel - Re: fork on processes with lots of memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 26 Feb 2016 12:41:07 -0500
From:	lwoodman@...hat.com
To:	Hugh Dickins <hughd@...gle.com>,
	Felix von Leitner <felix-linuxkernel@...e.de>
CC:	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: fork on processes with lots of memory

On 01/27/2016 10:09 PM, Hugh Dickins wrote:
> On Tue, 26 Jan 2016, Felix von Leitner wrote:
>>> Dear Linux kernel devs,
>>> I talked to someone who uses large Linux based hardware to run a
>>> process with huge memory requirements (think 4 GB), and he told me that
>>> if they do a fork() syscall on that process, the whole system comes to
>>> standstill. And not just for a second or two. He said they measured a 45
>>> minute (!) delay before the system became responsive again.
>> I'm sorry, I meant 4 TB not 4 GB.
>> I'm not used to working with that kind of memory sizes.
>>
>>> Their working theory is that all the pages need to be marked copy-on-write
>>> in both processes, and if you touch one page, a copy needs to be made,
>>> and than just takes a while if you have a billion pages.
>>> I was wondering if there is any advice for such situations from the
>>> memory management people on this list.
>>> In this case the fork was for an execve afterwards, but I was going to
>>> recommend fork to them for something else that can not be tricked around
>>> with vfork.
>>> Can anyone comment on whether the 45 minute number sounds like it could
>>> be real? When I heard it, I was flabberghasted. But the other person
>>> swore it was real. Can a fork cause this much of a delay? Is there a way
>>> to work around it?
>>> I was going to recommend the fork to create a boundary between the
>>> processes, so that you can recover from memory corruption in one
>>> process. In fact, after the fork I would want to munmap almost all of
>>> the shared pages anyway, but there is no way to tell fork that.
> You might find madvise(addr, length, MADV_DONTFORK) helpful:
> that tells fork not to duplicate the given range in the child.
>
> Hugh

I dont know exactly what program they are running but we test RHEL with 
up to 24TB
of memory and have not seen this problem.  I have mmap()'d 12TB of 
memory into a
parent process private, touched every page then forked a child which 
wrote to every
page thereby incurring tons of ZFOD and COW faults.  It takes a while to 
process the
6 billion faults but the system didnt come to a halt.  The time I do see 
significant pauses
is when we overcommit RAM and swap space and get into an OOMkill storm.

Attached is the program:

>
>>> Thanks,
>>> Felix
>>> PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
>>> list.
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>


View attachment "forkoff.c" of type "text/plain" (1402 bytes)