linux-kernel - Re: Kernel WARNING: at net/core/dev.c:1330 __netif

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1216890648.7257.258.camel@twins>
Date:	Thu, 24 Jul 2008 11:10:48 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	David Miller <davem@...emloft.net>
Cc:	jarkao2@...il.com, Larry.Finger@...inger.net, kaber@...sh.net,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-wireless@...r.kernel.org, mingo@...hat.com,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Paul E McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: Kernel WARNING: at net/core/dev.c:1330
	__netif_schedule+0x2c/0x98()

On Wed, 2008-07-23 at 13:16 -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@...il.com>
> Date: Wed, 23 Jul 2008 11:49:14 +0000
> 
> > On Wed, Jul 23, 2008 at 11:35:19AM +0000, Jarek Poplawski wrote:
> > > On Wed, Jul 23, 2008 at 12:58:16PM +0200, Peter Zijlstra wrote:
> > ...
> > > > When I look at the mac802.11 code in ieee80211_tx_pending() it looks
> > > > like it can do with just one lock at a time, instead of all - but I
> > > > might be missing some obvious details.
> > > > 
> > > > So I guess my question is, is netif_tx_lock() here to stay, or is the
> > > > right fix to convert all those drivers to use __netif_tx_lock() which
> > > > locks only a single queue?
> > > > 
> > > 
> > > It's a new thing mainly for new hardware/drivers, and just after
> > > conversion (older drivers effectively use __netif_tx_lock()), so it'll
> > > probably stay for some time until something better is found. David,
> > > will tell the rest, I hope.
> > 
> > ...And, of course, these new drivers should also lock a single queue
> > where possible.
> 
> It isn't going away.
> 
> There will always be a need for a "stop all the TX queues" operation.

Ok, then how about something like this, the idea is to wrap the per tx
lock with a read lock of the device and let the netif_tx_lock() be the
write side, therefore excluding all device locks, but not incure the
cacheline bouncing on the read side by using per-cpu counters like rcu
does.

This of course requires that netif_tx_lock() is rare, otherwise stuff
will go bounce anyway...

Probably missed a few details,.. but I think the below ought to show the
idea...

struct tx_lock {
	int busy;
	spinlock_t lock;
	unsigned long *counters;
};


int tx_lock_init(struct tx_lock *txl)
{
	txl->busy = 0;
	spin_lock_init(&txl->lock);
	txl->counters = alloc_percpu(unsigned long);

	if (!txl->counters)
		return -ENOMEM;

	return 0;
}

void __netif_tx_lock(struct netdev_queue *txq, cpu)
{
	struct net_device *dev = txq->dev;

	if (rcu_dereference(dev->tx_lock.busy)) {
		spin_lock(&dev->tx_lock.lock);
		(*percpu_ptr(dev->tx_lock.counters, cpu))++;
		spin_unlock(&dev->tx_lock.lock);
	} else
		(*percpu_ptr(dev->tx_lock.counters, cpu))++;

	spin_lock(&txq->_xmit_lock);
	txq->xmit_lock_owner = cpu;
}

void __netif_tx_unlock(struct netdev_queue *txq)
{
	struct net_device *dev = txq->dev;

	(*percpu_ptr(dev->tx_lock.counters, txq->xmit_lock_owner))--;
	txq->xmit_lock_owner = -1;
	spin_unlock(&txq->xmit_lock);
}

unsigned long tx_lock_read_counters(struct tx_lock *txl)
{
	int i;
	unsigned long counter = 0;

	/* can use online - the inc/dec are matched per cpu */
	for_each_online_cpu(i)
		counter += *percpu_ptr(txl->counters, i);

	return counter;
}

void netif_tx_lock(struct net_device *dev)
{
	spin_lock(&dev->tx_lock.lock);
	rcu_assign_pointer(dev->tx_lock.busy, 1);

	while (tx_lock_read_counters(&dev->tx_lock)
		cpu_relax();
}

void netif_tx_unlock(struct net_device *dev)
{
	rcu_assign_pointer(dev->tx_lock.busy, 0);
	smp_wmb(); /* because rcu_assign_pointer is broken */
	spin_unlock(&dev->tx_lock.lock);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/