lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <FA7AA0E9-364B-44F5-AAAD-A054EFB213ED@cambridgesemantics.com>
Date:	Fri, 25 Mar 2016 11:49:02 -0700
From:	David Barto <barto@...bridgesemantics.com>
To:	linux-kernel@...r.kernel.org
Subject: RSS calculation in the 3.X Kernel

Hello,
First I want to point out that I am not a Linux kernel developer, however I have done kernel development on Berkely Unix (4.X) in the distant past.

What I'm trying to discover in the Linux kernel is how the RSS is calculated in the 3.X kernels. I know that the current release is in the 4.X phase however I must work with what our customers want to use, not what I would prefer.

The kernel mailing list has excellent coverage of adding more reporting forHugeTLBPages values to the 4.X kernel and that is an interesting read. I was attempting to use that as a way to discover the RSS calculations in the 3.X kernel, however it didn't get me far.

The problem:
I have a program that uses lots of data, literally as much as physical RAM and I need to load this data in a way that I can detect when I'm running out of RAM to know when to push the 'stop loading' button; or when executing scans of this data to know when I've allocated too much working memory and again, push the 'stop' button.

The program only uses mmap/mprotect/munmap/madivse to manage memory. It will preallocate a very large amount of virtual address space using mmap as unbacked memory and then back the memory on an as-needed basis. The program traps all calls to malloc/calloc/realloc as well as both kinds of operator new along with the associated free/delete routines. All memory allocation is redirected into mmap operations.

When running I can't afford to spend time looking at a file (/proc/pid/statm) to see if memory is full, I need to know at the time of allocation that I'm done. As a result I need to know if the Linux OOM killer will shoot me down because of over subscription on the next call to allocate more memory.

Since I'm trapping all calls to any memory allocation, including allocation though the C and C++ libraries, I don't understand why the kernel is reporting a higher RSS size than I think I should have. If I think I've allocated 120GB the kernel will report that my RSS size is over 160GB. This descrepency grows larger as I load more data. I’m not getting an error from mprotect when I attempt to add more backed memory than the system supports, which would be acceptable as an OOM error to my program. I would expect ENOMEM if I could not map the required memory, instead I get hit by the OOM killer. If anyone would like a program that demonstrates this I have one. An interesting point on the program is that after a mapping of 10GB of ram and subsequent unmapping, my RSS size has increased from 1.5MB to 2.4MB. I need to understand this kind of ‘behind the scenes’ allocations charged to my program.

To this end I'm appealing to the Linux Kernel developers for a helping hint (or 3) to understand the accounting of RSS size for the 3.X kernel. I don't need a complete walk through, just a 'look here' kind of thing. I've been through the mm/mmap.c and the mm/memory.c files and I'm having no luck in putting the pieces together.

I know that the reporting is held in the mm_rss_stat structure and is initalized in init_rss_vec and updated by inline functions in mm.h.

When I walk through the unmap_page_range I see where (eventually) zap_pte_range is called and that eventually calls add_mm_rss_vec to update the various mm_counters.

When mapping I can see a call to sys_mmap_pgoff from sys_x86_64.c and can't find any definition of sys_mmap_pgoff in the kernel files. I do see a __SYSCALL(192, sys_mmap_pgoff) and a __SYSCALL(80, sys_mmap_pgoff, 6).

What else could modify the RSS of a running process? I'm not creating any new threads, I'm not forking the program. I'm just loading data (read from file, convert to internal format, MMAP some space, write to memory) for later use and that is causing me grief as the kernel's idea of my RSS far exceeds my idea.

I'm not on the Linux Developers mailing list, so please CC me in any reply.

Thanks for your time and consideration,

  David Barto
  barto@...brigesemantics.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ