linux-kernel - Re: [PATCH] munmap() don't check sysctl_max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20091012184654.E4D0.A69D9226@jp.fujitsu.com>
Date:	Mon, 12 Oct 2009 19:13:41 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Hugh Dickins <hugh.dickins@...cali.co.uk>
Cc:	kosaki.motohiro@...fujitsu.com,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] munmap() don't check sysctl_max_mapcount

Hi Hugh

> > Hi everyone,
> > 
> > Is this good idea?
> 
> Sorry, I overlooked this earlier.
> 
> Correct me if I'm wrong, but from the look of your patch,
> I believe anyone could increase their mapcount arbitrarily far beyond
> sysctl_max_map_count, just by taking little bites out of a large mmap.
> 
> In which case there's not much point in having sysctl_max_map_count
> at all.  Perhaps there isn't much point to it anyway, and the answer
> is just to raise it to where it catches runaways but interferes with
> nobody else?
> 
> If you change your patch so that do_munmap() cannot increase the final
> number vmas beyond sysctl_max_map_count, that would seem reasonable.
> But would that satisfy your testcase?  And does the testcase really
> matter in practice?  It doesn't seem to have upset anyone before.

Very thank you for payed attention to my patch. Yes, this is real issue.
my customer suffer from it.

May I explain why you haven't seen this issue? this issue is architecture
independent problem. however register stack architecture (e.g. ia64, sparc)
dramatically increase the possibility of the heppen this issue.

Why? the stack of the typical architecture have one PROT_READ|PROT_WRITE area and one
PROT_NONE area (it is called as guard page).
if the process multiple thread, stack layout is

fig1)
  +--------+--------+-------+-------+--------------
  | thr-0  | thr-0  | thr-1 | thr-1 | 
  | guard  | stack  | guard | stack | ......
  +--------+--------+-------+-------+-------------

Thus, stack freeing didn't make vma splitting. in the other hand,
the stack of the register stack architecture have two PROT_READ|PROT_WRITE
area and one PROT_NONE area.
then, stack layout is

fig2)
  +-----------+--------+------------------+-------+------------------+-----------
  | thr-0     | thr-0  | thr-0 stack      | thr-1 | thr-1 stack      | thr-2 
  | reg-stack | guard  | + thr-1 regstack | guard | + thr-2 regstack | guard
  +-----------+--------+------------------+-------+------------------+-----------

Then, stack unmapping may cause vma splitting.

However, non-register stack architecture don't free from this issue.
if the program use malloc(), it can makes fig2 layout. please run attached
test program (test_max_mapcount_for_86.c), it should addressed the problem.

And, I doubt I haven't catch your mention. May I ask some question?
Honestly I don't think max_map_count is important knob. it is strange
mutant of limit of virtual address space in the process.
At very long time ago (probably the stone age), linux doesn't have
vma rb_tree handling, then many vma directly cause find_vma slow down.
However current linux have good scalability. it can handle many vma issue.
So, Why do you think max_mapcount sould be strictly keeped?

Honestly, I doubt nobody suffer from removing sysctl_max_mapcount.

And yes, stack unmapping have exceptional charactatics. the guard zone
gurantee it never raise map_count. 
So, I think the attached patch (0001-Don-t...) is the same as you talked about, right?
I can accept it. I haven't test it on ia64. however, at least it works
well on x86.

BUT, I still think kernel souldn't refuse any resource deallocation.
otherwise, people discourage proper resource deallocation and encourage
brutal intentional memory leak programming style. What do you think?

Download attachment "test_max_mapcount_for_86.c" of type "application/octet-stream" (1333 bytes)

Download attachment "0001-Don-t-make-ENOMEM-temporary-mapcount-exceeding-in-mu.patch" of type "application/octet-stream" (5413 bytes)