lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <46128051.9000609@redhat.com>
Date:	Tue, 03 Apr 2007 09:26:57 -0700
From:	Ulrich Drepper <drepper@...hat.com>
To:	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>
CC:	Jakub Jelinek <jakub@...hat.com>
Subject: missing madvise functionality

People might remember the thread about mysql not scaling and pointing
the finger quite happily at glibc.  Well, the situation is not like that.

The problem is glibc has to work around kernel limitations.  If the
malloc implementation detects that a large chunk of previously allocated
memory is now free and unused it wants to return the memory to the
system.  What we currently have to do is this:

  to free:      mmap(PROT_NONE) over the area
  to reuse:     mprotect(PROT_READ|PROT_WRITE)

Yep, that's expensive, both operations need to get locks preventing
other threads from doing the same.

Some people were quick to suggest that we simply avoid the freeing in
many situations (that's what the patch submitted by Yanmin Zhang
basically does).  That's no solution.  One of the very good properties
of the current allocator is that it does not use much memory.

A solution for this problem is a madvise() operation with the following
property:

  - the content of the address range can be discarded

  - if an access to a page in the range happens in the future it must
    succeed.  The old page content can be provided or a new, empty page
    can be provided

That's it.  The current MADV_DONTNEED doesn't cut it because it zaps the
pages, causing *all* future reuses to create page faults.  This is what
I guess happens in the mysql test case where the pages where unused and
freed but then almost immediately reused.  The page faults erased all
the benefits of using one mprotect() call vs a pair of mmap()/mprotect()
calls.

So, if all those who were so interested in that micro benchmark could
now please direct their attention to a good madvise solution I'd be much
obliged.  It'll be put to good use right away and it should be quite
easy to provide a glibc patch to test the new kernel code.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Download attachment "signature.asc" of type "application/pgp-signature" (252 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ