[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <434B6A05-E82A-4AF4-94E2-E1F3DA9A5268@thehive.com>
Date: Tue, 9 Jun 2009 15:14:08 -0400
From: Matthew Von Maszewski <matthew@...hive.com>
To: Matthew Von Maszewski <matthew@...hive.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
linux-kernel@...r.kernel.org,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: huge mem mmap eats all CPU when multiple processes
Re test program:
I am not yet able to create a simple test program that:
1. matches the huge mem performance problem seen in "top" sample
below, and
2. has clean execution when switched to a non hugemem file/mmap.
Using process shared pthread_mutex_t objects inside tight loops
creates something similar. But the huge mem file and standard vm file
both have the problem. Maybe this slightly supports Kame's comment
about activity being serialized on a system mutex for huge mem ? ... I
am not qualified to judge.
Open to any suggestions for tests / measurements.
Matthew
On Jun 9, 2009, at 10:16 AM, Matthew Von Maszewski wrote:
> My apologies for lack of clarity in the original email. I am
> working on a test program to send out later today. Here are my
> responses to the questions asked:
>
>
> On Jun 8, 2009, at 8:41 PM, KAMEZAWA Hiroyuki wrote:
>
>> On Mon, 8 Jun 2009 10:27:49 -0400
>> Matthew Von Maszewski <matthew@...hive.com> wrote:
>>
>>> [note: not on kernel mailing list, please cc author]
>>>
>>> Symptom: 9 processes mmap same 2 Gig memory section for a shared C
>>> heap (lots of random access). All process begin extreme CPU load in
>>> top.
>>>
>>> - Same code works well when only single process access huge mem.
>> Does this "huge mem" means HugeTLB(2M/4Mbytes) pages ?
>
> Yes. My debian x86_64 kernel build uses 2m pages. Test by one
> process is really fast. Test by multiple process against same
> mmap() file are really slow
>
>>
>>
>>> - Code works well with standard vm based mmap file and 9 processes.
>>>
>>
>> What is sys/user ratio in top ? Almost all cpus are used by "sys" ?
>
>
> Tasks: 94 total, 3 running, 91 sleeping, 0 stopped, 0 zombie
> Cpu0 : 5.6%us, 86.4%sy, 0.0%ni, 1.3%id, 5.3%wa, 0.0%hi,
> 1.3%si, 0.0%st
> Cpu1 : 1.0%us, 92.4%sy, 0.0%ni, 0.0%id, 5.6%wa, 0.0%hi,
> 1.0%si, 0.0%st
> Cpu2 : 1.7%us, 90.4%sy, 0.0%ni, 0.0%id, 7.3%wa, 0.0%hi,
> 0.7%si, 0.0%st
> Cpu3 : 0.0%us, 70.4%sy, 0.0%ni, 25.1%id, 4.0%wa, 0.0%hi,
> 0.5%si, 0.0%st
> Mem: 6103960k total, 2650044k used, 3453916k free, 6068k
> buffers
> Swap: 5871716k total, 0k used, 5871716k free, 84504k
> cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3681 proxy 20 0 2638m 1596 1312 S 43 0.0 0:07.87
> tentacle.e.prof
> 3687 proxy 20 0 2656m 1592 1312 S 43 0.0 0:07.69
> tentacle.e.prof
> 3689 proxy 20 0 2662m 1600 1312 S 42 0.0 0:07.82
> tentacle.e.prof
> 3683 proxy 20 0 2652m 1596 1312 S 41 0.0 0:07.75
> tentacle.e.prof
> 3684 proxy 20 0 2650m 1596 1312 S 41 0.0 0:07.89
> tentacle.e.prof
> 3686 proxy 20 0 2644m 1596 1312 S 40 0.0 0:07.80
> tentacle.e.prof
> 3685 proxy 20 0 2664m 1592 1312 S 40 0.0 0:07.82
> tentacle.e.prof
> 3682 proxy 20 0 2646m 1616 1328 S 38 0.0 0:07.73
> tentacle.e.prof
> 3664 proxy 20 0 2620m 1320 988 R 36 0.0 0:01.08 tentacle.e
> 3678 proxy 20 0 72352 35m 1684 R 11 0.6 0:01.79 squid
>
> tentacle.e and tentacle.e.prof are copies of the same executable
> file, started with different command line options. tentacle.e is
> started by an init.d script. tentacle.e.prof processes are started
> by squid.
>
> I am creating a simplified program to duplicate the scenario. Will
> send it along later today.
>
>>
>>
>>> Environment:
>>>
>>> - Intel x86_64: Dual core Xeon with hyperthreading (4 logical
>>> processors)
>>> - 6 Gig ram, 2.5G allocated to huge mem
>> by boot option ?
>
> huge mem initialization
>
> 1. sysctl.conf allocates the desired number of 2M pages:
>
> system:/mnt$ tail -n 3 /etc/sysctl.conf
> #huge
> vm.nr_hugepages=1200
>
>
> 2. init.d script for tentacle.e mounts the file system and
> preallocates space
>
> (from init.d file starting tentacle.e)
>
> umount /mnt/hugefs
> mount -t hugetlbfs -o uid=proxy,size=2300M none /mnt/hugefs
>
> system:/mnt df -kP
> Filesystem 1024-blocks Used Available Capacity Mounted on
> /dev/sda1 135601864 32634960 96078636 26% /
> tmpfs 3051980 0 3051980 0% /lib/
> init/rw
> udev 10240 68 10172 1% /dev
> tmpfs 3051980 0 3051980 0% /dev/shm
> none 2355200 2117632 237568 90% /mnt/
> hugefs
>
>
>>
>>
>>> - tried with kernels 2.6.29.4 and 2.6.30-rc8
>>> - following mmap() call has base address as NULL on first process,
>>> then returned address passed to subsequent processes (not threads,
>>> processes)
>>>
>>> m_MemSize=((m_MemSize/(2048*1024))+1)*2048*1024;
>>> m_BaseAddr=mmap(m_File->GetFixedBase(), m_MemSize,
>>> (PROT_READ | PROT_WRITE),
>>> MAP_SHARED, m_File->GetFileId(),
>>> m_Offset);
>>>
>>>
>>> I am not a kernel hacker so I have not attempted to debug. Will be
>>> able to spend time on a sample program for sharing later today or
>>> tomorrow. Sending this note now in case this is already known.
>>>
>>
>> IIUC, all page faults to hugetlb are serialized by system's mutex.
>> Then, touching
>> in parallel doesn't do fast job..
>> Then, I wonder touching all necessary maps by one thread is good,
>> in general.
>>
>>
>>
>>> Don't suppose this is as simple as a Copy-On-Write flag being set
>>> wrong?
>>>
>> I don't think, so.
>>
>>> Please send notes as to things I need to capture to better describe
>>> this bug. Happy to do the work.
>>>
>> Add cc to linux-mm.
>>
>> Thanks,
>> -Kame
>>
>>
>>> Thanks,
>>> Matthew
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> kernel" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists