linux-kernel - Re: missing madvise functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 04 Apr 2007 12:22:00 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Eric Dumazet <dada1@...mosbay.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Jakub Jelinek <jakub@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Rik van Riel <riel@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	linux-mm@...ck.org, Hugh Dickins <hugh@...itas.com>
Subject: Re: missing madvise functionality

Eric Dumazet wrote:
> Andrew Morton a écrit :
> 
>> On Tue, 3 Apr 2007 16:29:37 -0400
>> Jakub Jelinek <jakub@...hat.com> wrote:
>>
>>> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> Ulrich, could you suggest a little test app which would demonstrate 
>>>>> this
>>>>> behaviour?
>>>>
>>>> It's not really reliably possible to demonstrate this with a small
>>>> program using malloc.  You'd need something like this mysql test case
>>>> which Rik said is not hard to run by yourself.
>>>>
>>>> If somebody adds a kernel interface I can easily produce a glibc patch
>>>> so that the test can be run in the new environment.
>>>>
>>>> But it's of course easy enough to simulate the specific problem in a
>>>> micro benchmark.  If you want that let me know.
>>>
>>> I think something like following testcase which simulates what free
>>> and malloc do when trimming/growing a non-main arena.
>>>
>>> My guess is that all the page zeroing is pretty expensive as well and
>>> takes significant time, but I haven't profiled it.
>>>
>>> #include <pthread.h>
>>> #include <stdlib.h>
>>> #include <sys/mman.h>
>>> #include <unistd.h>
>>>
>>> void *
>>> tf (void *arg)
>>> {
>>>   (void) arg;
>>>   size_t ps = sysconf (_SC_PAGE_SIZE);
>>>   void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
>>>                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>   if (p == MAP_FAILED)
>>>     exit (1);
>>>   int i;
>>>   for (i = 0; i < 100000; i++)
>>>     {
>>>       /* Pretend to use the buffer.  */
>>>       char *q, *r = (char *) p + 128 * ps;
>>>       size_t s;
>>>       for (q = (char *) p; q < r; q += ps)
>>>         *q = 1;
>>>       for (s = 0, q = (char *) p; q < r; q += ps)
>>>         s += *q;
>>>       /* Free it.  Replace this mmap with
>>>          madvise (p, 128 * ps, MADV_THROWAWAY) when implemented.  */
>>>       if (mmap (p, 128 * ps, PROT_NONE,
>>>                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
>>>         exit (2);
>>>       /* And immediately malloc again.  This would then be deleted.  */
>>>       if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
>>>         exit (3);
>>>     }
>>>   return NULL;
>>> }
>>>
>>> int
>>> main (void)
>>> {
>>>   pthread_t th[32];
>>>   int i;
>>>   for (i = 0; i < 32; i++)
>>>     if (pthread_create (&th[i], NULL, tf, NULL))
>>>       exit (4);
>>>   for (i = 0; i < 32; i++)
>>>     pthread_join (th[i], NULL);
>>>   return 0;
>>> }
>>>
>>
>> whee.  135,000 context switches/sec on a slow 2-way.  mmap_sem, most
>> likely.  That is ungood.
>>
>> Did anyone monitor the context switch rate with the mysql test?
>>
>> Interestingly, your test app (with s/100000/1000) runs to completion 
>> in 13
>> seocnd on the slow 2-way.  On a fast 8-way, it took 52 seconds and
>> sustained 40,000 context switches/sec.  That's a bit unexpected.
>>
>> Both machines show ~8% idle time, too :(
> 
> 
> Yes... then add to this some futex work, and you get the picture.
> 
> I do think such workloads might benefit from a vma_cache not shared by 
> all threads but private to each thread. A sequence could invalidate the 
> cache(s).
> 
> ie instead of a mm->mmap_cache, having a mm->sequence, and each thread 
> having a current->mmap_cache and current->mm_sequence

I have a patchset to do exactly this, btw.

Anyway what is the status of the private futex work. I don't think that
is very intrusive or complicated, so it should get merged ASAP (so then
at least we have the interface there).

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/