lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 27 Jun 2019 17:46:29 +0200
From:   Thomas Jarosch <thomas.jarosch@...ra2net.com>
To:     Florian Westphal <fw@...len.de>
Cc:     netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
        Juliana Rodrigueiro <juliana.rodrigueiro@...ra2net.com>
Subject: Re: 4.19: Traced deadlock during xfrm_user module load

Hi Florian,

You wrote on Tue, Jun 25, 2019 at 06:53:44PM +0200:
> Thanks for this detailed analysis.
> In this specific case I think this is enough:
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 92077d459109..61ba92415480 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -578,7 +578,8 @@ static int nfnetlink_bind(struct net *net, int group)
>         ss = nfnetlink_get_subsys(type << 8);
>         rcu_read_unlock();
>         if (!ss)
> -               request_module("nfnetlink-subsys-%d", type);
> +               request_module_nowait("nfnetlink-subsys-%d", type);
>         return 0;
>  }
>  #endif

thanks for the patch! We finally found an easy way to reproduce the deadlock,
the following commands instantly trigger the problem on our machines:

    rmmod nf_conntrack_netlink
    rmmod xfrm_user
    conntrack -e NEW -E & modprobe -v xfrm_user

Note: the "-e" filter is needed to trigger the problematic
code path in the kernel.

We were worried that using "_nowait" would introduce other race conditions,
since the requested service might not be available by the time it is required.

On the other hand, if we understand correctly, it seems that after
"nfnetlink_bind()", the caller will listen on the socket for messages
regardless whether the needed modules are loaded, loading or unloaded.
To verify this we added a three second sleep during the initialisation of
nf_conntrack_netlink. The events started to appear after
the delayed init was completed.

If this is the case, then using "_nowait" should suffice as a fix
for the problem. Could you please confirm these assumptions
and give us some piece of mind?

Best regards,
Juliana Rodrigueiro and Thomas Jarosch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ