lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 May 2020 17:26:31 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     SeongJae Park <sjpark@...zon.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        sj38.park@...il.com, netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        SeongJae Park <sjpark@...zon.de>, snu@...zon.com,
        amit@...nel.org, stable@...r.kernel.org
Subject: Re: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle
 change

On Tue, May 05, 2020 at 09:00:44AM -0700, Eric Dumazet wrote:

> > Not exactly the 10,000,000, as it is only the possible highest number, but I
> > was able to observe clear exponential increase of the number of the objects
> > using slabtop.  Before the start of the problematic workload, the number of
> > objects of 'kmalloc-64' was 5760, but I was able to observe the number increase
> > to 1,136,576.
> >
> >           OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > before:   5760   5088  88%    0.06K     90       64       360K kmalloc-64
> > after:  1136576 1136576 100%    0.06K  17759       64     71036K kmalloc-64
> >
> 
> Great, thanks.
> 
> How recent is the kernel you are running for your experiment ?
> 
> Let's make sure the bug is not in RCU.
> 
> After Al changes, RCU got slightly better under stress.

The thing that worries me here is that this is far from being the only
source of RCU-delayed freeing of objects.  If we really see bogus OOM
kills due to that (IRL, not in an artificial microbenchmark), we'd
better do something that would help with all those sources, not just
paper over the contributions from one of those.  Because there's no
chance in hell to get rid of RCU-delayed freeing in general...

Does the problem extend to kfree_rcu()?  And there's a lot of RCU
callbacks that boil down to kmem_cache_free(); those really look like
they should have exact same issue - sock_free_inode() is one of those,
after all.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ