lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080617084936.GA4202@ff.dom.local>
Date:	Tue, 17 Jun 2008 08:49:36 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Patrick McHardy <kaber@...sh.net>
Cc:	Marcel Holtmann <marcel@...tmann.org>, netdev@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>, Thomas Graf <tgraf@...g.ch>
Subject: Re: netlink circular locking dependency

On Mon, Jun 16, 2008 at 11:48:01PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
>> ...
>>   
>>> =======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 2.6.26-rc2 #5
>>> -------------------------------------------------------
>>> hcid/4136 is trying to acquire lock:
>>>  (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>>>
>>> but task is already holding lock:
>>>  (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>>>
>>> which lock already depends on the new lock.
>>>     
>> ...
>>
>> Hi,
>>
>> IMHO it looks like a real lockup threat. Probably it needs something
>> better, but for now here is my simplistic patch proposal for testing.
>>   
> So we have:
>
> genl_rcv()            : take genl_mutex
> genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
> netlink_dump_start(),
> netlink_dump()        : take nlk->cb_mutex
> ctrl_dumpfamily()     : try to detect this case and not take genl_mutex a
>                        second time
>
> netlink_rcv()         : call netlink_dump
> netlink_dump          : take nlk->cb_mutex
> ctrl_dumpfamily()     : take genl_mutex
>
> which is a real bug.

Right. Probably there is also another variant:

#1
genl_rcv()            : take genl_mutex
genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
netlink_dump_start()  : mutex_unlock nlk->cb_mutex

#2
netlink_rcv()         : call netlink_dump
netlink_dump          : take nlk->cb_mutex
ctrl_dumpfamily()     : 1st run without genl_mutex

#1
netlink_dump()        : take nlk->cb_mutex
ctrl_dumpfamily()     : > 1st run: take genl_mutex 2nd time?!

>
> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
> drop genl_mutex before calling netlink_dump_start and don't take it
> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
> locking. Unfortunately this creates a race since the ops passed to
> netlink_dump_start are also protect by the mutex, so this patch
> is just for testing whether it fixes the warning.
>
> On second though - that race seems to be present already since
> the ops can be unregistered and the module unloaded while a dump
> is in progress.

Yes, very interesting. I guess there will be some followup...

Regards,
Jarek P.


> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index f5aa23c..3e1191c 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  		if (ops->dumpit == NULL)
>  			return -EOPNOTSUPP;
>  
> -		return netlink_dump_start(genl_sock, skb, nlh,
> -					  ops->dumpit, ops->done);
> +		genl_unlock();
> +		err = netlink_dump_start(genl_sock, skb, nlh,
> +					 ops->dumpit, ops->done);
> +		genl_lock();
> +		return err;
>  	}
>  
>  	if (ops->doit == NULL)
> @@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
>  	int chains_to_skip = cb->args[0];
>  	int fams_to_skip = cb->args[1];
>  
> -	if (chains_to_skip != 0)
> -		genl_lock();
> -
>  	for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
>  		if (i < chains_to_skip)
>  			continue;
> @@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
>  	}
>  
>  errout:
> -	if (chains_to_skip != 0)
> -		genl_unlock();
> -
>  	cb->args[0] = i;
>  	cb->args[1] = n;
>  
> @@ -770,7 +767,7 @@ static int __init genl_init(void)
>  
>  	/* we'll bump the group number right afterwards */
>  	genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
> -					  genl_rcv, NULL, THIS_MODULE);
> +					  genl_rcv, &genl_mutex, THIS_MODULE);
>  	if (genl_sock == NULL)
>  		panic("GENL: Cannot initialize generic netlink\n");
>  

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ