linux-kernel - Re: missing madvise functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4612DCC6.7000504@cosmosbay.com>
Date:	Wed, 04 Apr 2007 01:01:26 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Jakub Jelinek <jakub@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Rik van Riel <riel@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	linux-mm@...ck.org, Hugh Dickins <hugh@...itas.com>
Subject: Re: missing madvise functionality

Andrew Morton a écrit :
> On Tue, 3 Apr 2007 16:29:37 -0400
> Jakub Jelinek <jakub@...hat.com> wrote:
> 
>> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
>>> Andrew Morton wrote:
>>>> Ulrich, could you suggest a little test app which would demonstrate this
>>>> behaviour?
>>> It's not really reliably possible to demonstrate this with a small
>>> program using malloc.  You'd need something like this mysql test case
>>> which Rik said is not hard to run by yourself.
>>>
>>> If somebody adds a kernel interface I can easily produce a glibc patch
>>> so that the test can be run in the new environment.
>>>
>>> But it's of course easy enough to simulate the specific problem in a
>>> micro benchmark.  If you want that let me know.
>> I think something like following testcase which simulates what free
>> and malloc do when trimming/growing a non-main arena.
>>
>> My guess is that all the page zeroing is pretty expensive as well and
>> takes significant time, but I haven't profiled it.
>>
>> #include <pthread.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> #include <unistd.h>
>>
>> void *
>> tf (void *arg)
>> {
>>   (void) arg;
>>   size_t ps = sysconf (_SC_PAGE_SIZE);
>>   void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
>>                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>   if (p == MAP_FAILED)
>>     exit (1);
>>   int i;
>>   for (i = 0; i < 100000; i++)
>>     {
>>       /* Pretend to use the buffer.  */
>>       char *q, *r = (char *) p + 128 * ps;
>>       size_t s;
>>       for (q = (char *) p; q < r; q += ps)
>>         *q = 1;
>>       for (s = 0, q = (char *) p; q < r; q += ps)
>>         s += *q;
>>       /* Free it.  Replace this mmap with
>>          madvise (p, 128 * ps, MADV_THROWAWAY) when implemented.  */
>>       if (mmap (p, 128 * ps, PROT_NONE,
>>                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
>>         exit (2);
>>       /* And immediately malloc again.  This would then be deleted.  */
>>       if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
>>         exit (3);
>>     }
>>   return NULL;
>> }
>>
>> int
>> main (void)
>> {
>>   pthread_t th[32];
>>   int i;
>>   for (i = 0; i < 32; i++)
>>     if (pthread_create (&th[i], NULL, tf, NULL))
>>       exit (4);
>>   for (i = 0; i < 32; i++)
>>     pthread_join (th[i], NULL);
>>   return 0;
>> }
>>
> 
> whee.  135,000 context switches/sec on a slow 2-way.  mmap_sem, most
> likely.  That is ungood.
> 
> Did anyone monitor the context switch rate with the mysql test?
> 
> Interestingly, your test app (with s/100000/1000) runs to completion in 13
> seocnd on the slow 2-way.  On a fast 8-way, it took 52 seconds and
> sustained 40,000 context switches/sec.  That's a bit unexpected.
> 
> Both machines show ~8% idle time, too :(

Yes... then add to this some futex work, and you get the picture.

I do think such workloads might benefit from a vma_cache not shared by all 
threads but private to each thread. A sequence could invalidate the cache(s).

ie instead of a mm->mmap_cache, having a mm->sequence, and each thread having 
a current->mmap_cache and current->mm_sequence

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/