lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <462471F6.6020800@cosmosbay.com>
Date:	Tue, 17 Apr 2007 09:06:30 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	"Brian D. McGrew" <brian@...ionpro.com>
CC:	linux-kernel@...r.kernel.org
Subject: Re: Memory Allocation

Brian D. McGrew a écrit :
> Good evening gents!
> 
> I need some help in allocating memory and understanding how the system
> allocates memory with physical versus virtual page tables.  Please
> consider the following snippet of code.  Please, no wisecracks about bad
> code; it was written in 30 seconds in haste :-)
> 
> #include <iostream>
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <pthread.h>
> 
> const static u_long kMaxSize = (2048 * 2048 * 256);
> 
> void *msg(void *ptr);
> static u_long threads_done	= 0;
> 
> int
> main(int argc, char *argv[])
> {
>      pthread_t thread1;
>      pthread_t thread2;
> 
>      char *message1 = "Thread 1";
>      char *message2 = "Thread 2";
> 
>      int iret1;
>      int iret2;
> 
>      iret1 = pthread_create(&thread1, NULL, msg, (void *) message1);
>      iret2 = pthread_create(&thread2, NULL, msg, (void *) message2);
> 
>     //pthread_join(thread1, NULL);
>     //pthread_join(thread2, NULL); 
> 
>     while (threads_done < 2) {
> 	std::cout << "Threads complete: " << threads_done << std::endl;
> 	sleep(3);
>     }
> 
>     exit(0);
> }
> 
> void *
> msg(void *ptr)
> {
>     char *message = (char *) ptr;
> 
>     //
>     // Equal to 1 bank per thread of 256 each 4MP image buffers.  2GB.
>     //
>     char *buffer = new char[kMaxSize];
> 
>     u_long max = kMaxSize;
> 
>     //
>     // Init each buffer to 'something'.
>     //
>     for (u_long inx = 0; inx < max; inx++) {
> 	if (inx % 102400000 == 0) {
> 	    std::cout << message << ": Index: " << inx << std::endl;
> 	}
> 
>     	buffer[inx] = inx;
>     }
> 
>     free(buffer);
>     threads_done++;
> }
> 
> My test machine is a Dell Precision 490 with dual 5140 processors and
> 3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
> However, I need to allocate an array of char that is (2048 * 2048 * 256)
> and maybe even as large at (2048 * 2048 * 512).
> 
> Obviously I have enough physical memory in the box to do this.  However,
> I suspect that I'm running out of page table entries.  Please, correct
> me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
> increment to 256 or 512 it fails and it is my suspicion that I just
> don't have enough more in kernel memory to allocate this much memory in
> user space.  
> 
> Because of a piece of 3rd party hardware, I'm forced to run the kernel
> in the 4GB memory model.  What I need to be able to do is allocate an
> array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
> need the addresses that I get back to be contiguous, that's just the way
> my 3rd party hardware works.
> 
> I'm inclined to believe that this in not specifically a Linux problem
> but maybe an architecture problem???  But maybe there is some kind of
> work around in the kernel for it???  I'd find it hard to believe that
> I'm the first one that ever needed to use this much memory.
> 
> I ran this same code on two difference Macs.  One of them a Powerbook G4
> with 4GB of RAM and it was successful.  The other was a Macbook Pro with
> 4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
> runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
> it's an Intel/X86 issue!
> 
> How the heck to I get past this problem in Linux on the X86 plateform???
> 
> Thanks,

Hi Brian

Add this line at the begining of your msg() function :

char cmd[128];
sprintf(cmd, "cat /proc/%d/maps", getpid());
system(cmd);

You'll see :

08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-40a78000 rw-p 40279000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-80a7a000 rw-p 40279000 00:00 0
80a7a000-80a7b000 ---p 80a7a000 00:00 0
80a7b000-8127a000 rw-p 80a7b000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
   what():  St9bad_alloc
Aborted



The problem is about the dynamic libraries and thread stacks, that might be 
mapped in 0x40000000 zone. So your program cannot allocate a 2GB zone, because 
available zone for user program is 3GB, from 0x00000000 to 0xC0000000, but not 
contiguous.


Now if you compile your program with static libraries, it's a litle bit better :

g++ -o test1 -static test1.c -lpthread
# ./test1
Threads complete: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-41001000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-81002000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
   what():  St9bad_alloc
Killed

Still some mappings (thread stacks) are bitting you.

If you want to use so much memory on a 32bit kernel, you might tune your 
program to :

- Avoid dynamic libraries
- allocate thread stacks yourself, so that they wont be in the midle of your 
address space (using malloc() zone, in the 08139000-08xxxxxx range)
...
- Use a smarter kernel that can map in the other way (from the top to the 
down) (check /proc/sys/vm/legacy_va_layout )

Of course, switching to a 64bit kernel just make this problem not existant :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ