lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 25 Jan 2016 15:53:01 -0800
From:	Colin Cross <ccross@...roid.com>
To:	"Kirill A. Shutemov" <kirill@...temov.name>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>, Shaohua Li <shli@...com>,
	Siddhesh Poyarekar <siddhesh.poyarekar@...il.com>,
	Linux-MM <linux-mm@...ck.org>,
	lkml <linux-kernel@...r.kernel.org>, kernel-team@...com
Subject: Re: [PATCH] proc: revert /proc/<pid>/maps [stack:TID] annotation

On Mon, Jan 25, 2016 at 3:14 PM, Kirill A. Shutemov
<kirill@...temov.name> wrote:
> On Mon, Jan 25, 2016 at 01:30:00PM -0800, Colin Cross wrote:
>> On Tue, Jan 19, 2016 at 3:30 PM, Kirill A. Shutemov
>> <kirill@...temov.name> wrote:
>> > On Tue, Jan 19, 2016 at 02:14:30PM -0800, Andrew Morton wrote:
>> >> On Tue, 19 Jan 2016 13:02:39 -0500 Johannes Weiner <hannes@...xchg.org> wrote:
>> >>
>> >> > b764375 ("procfs: mark thread stack correctly in proc/<pid>/maps")
>> >> > added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of
>> >> > a stack VMA requires walking the entire thread list, turning this into
>> >> > quadratic behavior: a thousand threads means a thousand stacks, so the
>> >> > rendering of /proc/<pid>/maps needs to look at a million threads. The
>> >> > cost is not in proportion to the usefulness as described in the patch.
>> >> >
>> >> > Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
>> >> > /proc/<pid>/numa_maps) usable again for higher thread counts.
>> >> >
>> >> > The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained,
>> >> > as identifying the stack VMA there is an O(1) operation.
>> >>
>> >> Four years ago, ouch.
>> >>
>> >> Any thoughts on the obvious back-compatibility concerns?  ie, why did
>> >> Siddhesh implement this in the first place?  My bad for not ensuring
>> >> that the changelog told us this.
>> >>
>> >> https://lkml.org/lkml/2012/1/14/25 has more info:
>> >>
>> >> : Memory mmaped by glibc for a thread stack currently shows up as a
>> >> : simple anonymous map, which makes it difficult to differentiate between
>> >> : memory usage of the thread on stack and other dynamic allocation.
>> >> : Since glibc already uses MAP_STACK to request this mapping, the
>> >> : attached patch uses this flag to add additional VM_STACK_FLAGS to the
>> >> : resulting vma so that the mapping is treated as a stack and not any
>> >> : regular anonymous mapping.  Also, one may use vm_flags to decide if a
>> >> : vma is a stack.
>> >>
>> >> But even that doesn't really tell us what the actual *value* of the
>> >> patch is to end-users.
>> >
>> > I doubt it can be very useful as it's unreliable: if two stacks are
>> > allocated end-to-end (which is not good idea, but still) it can only
>> > report [stack:XXX] for the first one as they are merged into one VMA.
>> > Any other anon VMA merged with the stack will be also claimed as stack,
>> > which is not always correct.
>> >
>> > I think report the VMA as anon is the best we can know about it,
>> > everything else just rather expensive guesses.
>>
>> An alternative to guessing is the anonymous VMA naming patch used on
>> Android, https://lkml.org/lkml/2013/10/30/518.  It allows userspace to
>> name anonymous memory however it wishes, and prevents vma merging
>> adjacent regions with different names.  Android uses it to label
>> native heap memory, but it would work well for stacks too.
>
> I don't think preventing vma merging is fair price for the feature: you
> would pay extra in every find_vma() (meaning all page faults).
>
> I think it would be nice to have a way to store this kind of sideband info
> without impacting critical code path.
>
> One other use case I see for such sideband info is storing hits from
> MADV_HUGEPAGE/MADV_NOHUGEPAGE: need to split vma just for these hints is
> unfortunate.

In practice we don't see many extra VMAs from naming; alignment
requirements, guard pages, and permissions differences are usually
enough to keep adjacent anonymous VMAs from merging.  Here's an
example from a process on Android:
7f9086c000-7f9086d000 rw-p 00006000 fd:00 1495
  /system/lib64/libhardware_legacy.so
7f9086d000-7f9086e000 rw-p 00000000 00:00 0
7f9086e000-7f9086f000 rw-p 00000000 00:00 0
  [anon:linker_alloc]
7f90875000-7f90876000 r--p 00000000 00:00 0
  [anon:linker_alloc]
7f9087c000-7f9087d000 r--p 00000000 00:00 0
  [anon:linker_alloc]
7f90901000-7f90902000 ---p 00000000 00:00 0
  [anon:thread stack guard page]
7f90902000-7f90a00000 rw-p 00000000 00:00 0
  [stack:410]
7f90a00000-7f90c00000 rw-p 00000000 00:00 0
  [anon:libc_malloc]
7f90c02000-7f90c03000 ---p 00000000 00:00 0
  [anon:thread stack guard page]
7f90c03000-7f90d01000 rw-p 00000000 00:00 0
  [stack:409]
7f90d01000-7f90d02000 ---p 00000000 00:00 0
  [anon:thread stack guard page]
7f90d02000-7f90e00000 rw-p 00000000 00:00 0
  [stack:408]
7f90e00000-7f91200000 rw-p 00000000 00:00 0
  [anon:libc_malloc]
7f91206000-7f91207000 r--p 00000000 00:00 0
  [anon:linker_alloc]
7f91237000-7f91238000 ---p 00000000 00:00 0
  [anon:thread signal stack guard page]
7f91238000-7f9123c000 rw-p 00000000 00:00 0
  [anon:thread signal stack]
7f9123c000-7f9123d000 ---p 00000000 00:00 0
  [anon:thread signal stack guard page]
7f9123d000-7f91241000 rw-p 00000000 00:00 0
  [anon:thread signal stack]
7f91246000-7f91247000 ---p 00000000 00:00 0
  [anon:thread signal stack guard page]
7f91247000-7f9124b000 rw-p 00000000 00:00 0
  [anon:thread signal stack]
7f9124b000-7f9124c000 ---p 00000000 00:00 0
  [anon:thread signal stack guard page]
7f9124c000-7f91250000 rw-p 00000000 00:00 0
  [anon:thread signal stack]

I only see 2 extra VMAs here, the "[stack:410]" and "[stack:408]"
regions would have been merged with the following "[anon:libc_malloc]"
regions.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ