[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4612DCC6.7000504@cosmosbay.com>
Date: Wed, 04 Apr 2007 01:01:26 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: Jakub Jelinek <jakub@...hat.com>,
Ulrich Drepper <drepper@...hat.com>,
Andi Kleen <andi@...stfloor.org>,
Rik van Riel <riel@...hat.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org, Hugh Dickins <hugh@...itas.com>
Subject: Re: missing madvise functionality
Andrew Morton a écrit :
> On Tue, 3 Apr 2007 16:29:37 -0400
> Jakub Jelinek <jakub@...hat.com> wrote:
>
>> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
>>> Andrew Morton wrote:
>>>> Ulrich, could you suggest a little test app which would demonstrate this
>>>> behaviour?
>>> It's not really reliably possible to demonstrate this with a small
>>> program using malloc. You'd need something like this mysql test case
>>> which Rik said is not hard to run by yourself.
>>>
>>> If somebody adds a kernel interface I can easily produce a glibc patch
>>> so that the test can be run in the new environment.
>>>
>>> But it's of course easy enough to simulate the specific problem in a
>>> micro benchmark. If you want that let me know.
>> I think something like following testcase which simulates what free
>> and malloc do when trimming/growing a non-main arena.
>>
>> My guess is that all the page zeroing is pretty expensive as well and
>> takes significant time, but I haven't profiled it.
>>
>> #include <pthread.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> #include <unistd.h>
>>
>> void *
>> tf (void *arg)
>> {
>> (void) arg;
>> size_t ps = sysconf (_SC_PAGE_SIZE);
>> void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>> if (p == MAP_FAILED)
>> exit (1);
>> int i;
>> for (i = 0; i < 100000; i++)
>> {
>> /* Pretend to use the buffer. */
>> char *q, *r = (char *) p + 128 * ps;
>> size_t s;
>> for (q = (char *) p; q < r; q += ps)
>> *q = 1;
>> for (s = 0, q = (char *) p; q < r; q += ps)
>> s += *q;
>> /* Free it. Replace this mmap with
>> madvise (p, 128 * ps, MADV_THROWAWAY) when implemented. */
>> if (mmap (p, 128 * ps, PROT_NONE,
>> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
>> exit (2);
>> /* And immediately malloc again. This would then be deleted. */
>> if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
>> exit (3);
>> }
>> return NULL;
>> }
>>
>> int
>> main (void)
>> {
>> pthread_t th[32];
>> int i;
>> for (i = 0; i < 32; i++)
>> if (pthread_create (&th[i], NULL, tf, NULL))
>> exit (4);
>> for (i = 0; i < 32; i++)
>> pthread_join (th[i], NULL);
>> return 0;
>> }
>>
>
> whee. 135,000 context switches/sec on a slow 2-way. mmap_sem, most
> likely. That is ungood.
>
> Did anyone monitor the context switch rate with the mysql test?
>
> Interestingly, your test app (with s/100000/1000) runs to completion in 13
> seocnd on the slow 2-way. On a fast 8-way, it took 52 seconds and
> sustained 40,000 context switches/sec. That's a bit unexpected.
>
> Both machines show ~8% idle time, too :(
Yes... then add to this some futex work, and you get the picture.
I do think such workloads might benefit from a vma_cache not shared by all
threads but private to each thread. A sequence could invalidate the cache(s).
ie instead of a mm->mmap_cache, having a mm->sequence, and each thread having
a current->mmap_cache and current->mm_sequence
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists