lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 May 2020 17:07:17 +0200
From:   SeongJae Park <sjpark@...zon.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     SeongJae Park <sjpark@...zon.com>,
        David Miller <davem@...emloft.net>,
        "Al Viro" <viro@...iv.linux.org.uk>,
        Jakub Kicinski <kuba@...nel.org>,
        "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>,
        <sj38.park@...il.com>, netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        SeongJae Park <sjpark@...zon.de>, <snu@...zon.com>,
        <amit@...nel.org>, <stable@...r.kernel.org>
Subject: Re: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle change

On Tue, 5 May 2020 07:53:39 -0700 Eric Dumazet <edumazet@...gle.com> wrote:

> On Tue, May 5, 2020 at 4:54 AM SeongJae Park <sjpark@...zon.com> wrote:
> >
> > CC-ing stable@...r.kernel.org and adding some more explanations.
> >
> > On Tue, 5 May 2020 10:10:33 +0200 SeongJae Park <sjpark@...zon.com> wrote:
> >
> > > From: SeongJae Park <sjpark@...zon.de>
> > >
> > > The commit 6d7855c54e1e ("sockfs: switch to ->free_inode()") made the
> > > deallocation of 'socket_alloc' to be done asynchronously using RCU, as
> > > same to 'sock.wq'.  And the following commit 333f7909a857 ("coallocate
> > > socket_sq with socket itself") made those to have same life cycle.
> > >
> > > The changes made the code much more simple, but also made 'socket_alloc'
> > > live longer than before.  For the reason, user programs intensively
> > > repeating allocations and deallocations of sockets could cause memory
> > > pressure on recent kernels.
> >
> > I found this problem on a production virtual machine utilizing 4GB memory while
> > running lebench[1].  The 'poll big' test of lebench opens 1000 sockets, polls
> > and closes those.  This test is repeated 10,000 times.  Therefore it should
> > consume only 1000 'socket_alloc' objects at once.  As size of socket_alloc is
> > about 800 Bytes, it's only 800 KiB.  However, on the recent kernels, it could
> > consume up to 10,000,000 objects (about 8 GiB).  On the test machine, I
> > confirmed it consuming about 4GB of the system memory and results in OOM.
> >
> > [1] https://github.com/LinuxPerfStudy/LEBench
> 
> To be fair, I have not backported Al patches to Google production
> kernels, nor I have tried this benchmark.
> 
> Why do we have 10,000,000 objects around ? Could this be because of
> some RCU problem ?

Mainly because of a long RCU grace period, as you guess.  I have no idea how
the grace period became so long in this case.

As my test machine was a virtual machine instance, I guess RCU readers
preemption[1] like problem might affected this.

[1] https://www.usenix.org/system/files/conference/atc17/atc17-prasad.pdf

> 
> Once Al patches reverted, do you have 10,000,000 sock_alloc around ?

Yes, both the old kernel that prior to Al's patches and the recent kernel
reverting the Al's patches didn't reproduce the problem.


Thanks,
SeongJae Park

> 
> Thanks.
> 
> >
> > >
> > > To avoid the problem, this commit reverts the changes.
> >
> > I also tried to make fixup rather than reverts, but I couldn't easily find
> > simple fixup.  As the commits 6d7855c54e1e and 333f7909a857 were for code
> > refactoring rather than performance optimization, I thought introducing complex
> > fixup for this problem would make no sense.  Meanwhile, the memory pressure
> > regression could affect real machines.  To this end, I decided to quickly
> > revert the commits first and consider better refactoring later.
> >
> >
> > Thanks,
> > SeongJae Park
> >
> > >
> > > SeongJae Park (2):
> > >   Revert "coallocate socket_wq with socket itself"
> > >   Revert "sockfs: switch to ->free_inode()"
> > >
> > >  drivers/net/tap.c      |  5 +++--
> > >  drivers/net/tun.c      |  8 +++++---
> > >  include/linux/if_tap.h |  1 +
> > >  include/linux/net.h    |  4 ++--
> > >  include/net/sock.h     |  4 ++--
> > >  net/core/sock.c        |  2 +-
> > >  net/socket.c           | 23 ++++++++++++++++-------
> > >  7 files changed, 30 insertions(+), 17 deletions(-)
> > >
> > > --
> > > 2.17.1

Powered by blists - more mailing lists