linux-kernel - Re: [NETPOLL] netconsole: fix soft lockup when removing module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070701173558.GA207@tv-sign.ru>
Date:	Sun, 1 Jul 2007 21:35:58 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Jarek Poplawski <jarkao2@...pl>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"David S. Miller" <davem@...emloft.net>,
	linux-kernel@...r.kernel.org
Subject: Re: [NETPOLL] netconsole: fix soft lockup when removing module

Jarek Poplawski wrote:
>
>    #1
>    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
>    required a work function should always (unconditionally) rearm with
>    delay > 0 - otherwise it would endlessly loop. This patch replaces
>    this function with cancel_delayed_work(). Later kernel versions don't
>    require this, so here it's only for uniformity.

But 2.6.22 doesn't need this change, why it was merged?

In fact, I suspect this change adds a race,

> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work)
>  			netif_tx_unlock(dev);
>  			local_irq_restore(flags);
>  
> -			schedule_delayed_work(&npinfo->tx_work, HZ/10);
> +			if (atomic_read(&npinfo->refcnt))
> +				schedule_delayed_work(&npinfo->tx_work, HZ/10);
>  			return;
>  		}
>  		netif_tx_unlock(dev);
> @@ -785,9 +786,15 @@ void netpoll_cleanup(struct netpoll *np)
>  			if (atomic_dec_and_test(&npinfo->refcnt)) {
>  				skb_queue_purge(&npinfo->arp_tx);
>  				skb_queue_purge(&npinfo->txq);
> -				cancel_rearming_delayed_work(&npinfo->tx_work);
> +				cancel_delayed_work(&npinfo->tx_work);
>  				flush_scheduled_work();

Suppose that ->refcnt == 1, and queue_process() was preempted just after
atomic_read(&npinfo->refcnt).

netpoll_cleanup() comes, cancel_delayed_work() does nothing, flush_scheduled_work()
sleeps.

queue_process() gets CPU, re-schedules ->tx_work, and returns.

flush_scheduled_work() completes, netpoll_cleanup() frees npinfo and returns
while ->tx_work is pending.

No?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/