linux-kernel - Re: [PATCH] mm, oom: allow oom reaper to race with exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170725151754.3txp44a2kbffsxdg@node.shutemov.name>
Date:   Tue, 25 Jul 2017 18:17:54 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Oleg Nesterov <oleg@...hat.com>,
        Hugh Dickins <hughd@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

On Tue, Jul 25, 2017 at 04:26:26PM +0200, Michal Hocko wrote:
> On Mon 24-07-17 18:11:46, Michal Hocko wrote:
> > On Mon 24-07-17 17:51:42, Kirill A. Shutemov wrote:
> > > On Mon, Jul 24, 2017 at 04:15:26PM +0200, Michal Hocko wrote:
> > [...]
> > > > What kind of scalability implication you have in mind? There is
> > > > basically a zero contention on the mmap_sem that late in the exit path
> > > > so this should be pretty much a fast path of the down_write. I agree it
> > > > is not 0 cost but the cost of the address space freeing should basically
> > > > make it a noise.
> > > 
> > > Even in fast path case, it adds two atomic operation per-process. If the
> > > cache line is not exclusive to the core by the time of exit(2) it can be
> > > noticible.
> > > 
> > > ... but I guess it's not very hot scenario.
> > > 
> > > I guess I'm just too cautious here. :)
> > 
> > I definitely did not want to handwave your concern. I just think we can
> > rule out the slow path and didn't think about the fast path overhead.
> > 
> > > > > Should we do performance/scalability evaluation of the patch before
> > > > > getting it applied?
> > > > 
> > > > What kind of test(s) would you be interested in?
> > > 
> > > Can we at lest check that number of /bin/true we can spawn per second
> > > wouldn't be harmed by the patch? ;)
> > 
> > OK, so measuring a single /bin/true doesn't tell anything so I've done
> > root@...t1:~# cat a.sh 
> > #!/bin/sh
> > 
> > NR=$1
> > for i in $(seq $NR)
> > do
> >         /bin/true
> > done
> 
> I wanted to reduce a potential shell side effects so I've come with a
> simple program which forks and saves the timestamp before child exit and
> right after waitpid (see attached) and then measured it 100k times. Sure
> this still measures waitpid overhead and the signal delivery but this
> should be more or less constant on an idle system, right? See attached.
> 
> before the patch
> min: 306300.00 max: 6731916.00 avg: 437962.07 std: 92898.30 nr: 100000
> 
> after
> min: 303196.00 max: 5728080.00 avg: 436081.87 std: 96165.98 nr: 100000
> 
> The results are well withing noise as I would expect.

I've silightly modified your test case: replaced cpuid + rdtsc with
rdtscp. cpuid overhead is measurable in such tight loop.

3 runs before the patch:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 177200  205000  212900  217800  223700 2377000
 172400  201700  209700  214300  220600 1343000
 175700  203800  212300  217100  223000 1061000

3 runs after the patch:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 175900  204800  213000  216400  223600 1989000
 180300  210900  219600  223600  230200 3184000
 182100  212500  222000  226200  232700 1473000

The difference is still measuarble. Around 3%.

-- 
 Kirill A. Shutemov