lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141021151225.5df96645@voldemort.scrye.com>
Date:	Tue, 21 Oct 2014 15:12:25 -0600
From:	Kevin Fenzi <kevin@...ye.com>
To:	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon, 20 Oct 2014 14:53:59 -0600
Kevin Fenzi <kevin@...ye.com> wrote:

> On Mon, 20 Oct 2014 16:43:26 -0400
> Dave Jones <davej@...hat.com> wrote:
> 
> > I've seen similar soft lockup traces from the sys_unshare path when
> > running my fuzz tester.  It seems that if you create enough network
> > namespaces, it can take a huge amount of time for them to be
> > iterated. (Running trinity with '-c unshare' you can see the slow
> > down happen. In some cases, it takes so long that the watchdog
> > process kills it -- though the SIGKILL won't get delivered until
> > the unshare() completes)
> > 
> > Any idea what this machine had been doing prior to this that may
> > have involved creating lots of namespaces ?
> 
> That was right after boot. ;) 
> 
> This is my main rawhide running laptop.
> 
> A 'ip netns list' shows nothing.

Some more information: 

The problem started between: 

v3.17-7872-g5ff0b9e1a1da and v3.17-8307-gf1d0d14120a8

(I can try and do a bisect, but have to head out on a trip tomorrow)

In all the kernels with the problem, there is a kworker process in D. 

sysrq-t says: 
                                            Showing all locks held in the system:
Oct 21 15:06:31 voldemort.scrye.com kernel: 4 locks held by kworker/u16:0/6:
Oct 21 15:06:31 voldemort.scrye.com kernel:  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel:  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel:  #2:  (net_mutex){+.+.+.}, at: [<ffffffff817069fc>] cleanup_net+0x8c/0x1f0
Oct 21 15:06:31 voldemort.scrye.com kernel:  #3:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a395>]
_rcu_barrier+0x35/0x200

On first running any of the systemd units that use PrivateNetwork, then
run ok, but they are also set to timeout after a minute. On sucessive
runs they hang in D also.

kevin

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ