netdev - Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140112220921.GE16576@1wt.eu>
Date:	Sun, 12 Jan 2014 23:09:21 +0100
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	davem@...emloft.net, netdev@...r.kernel.org,
	Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
	Gregory CLEMENT <gregory.clement@...e-electrons.com>
Subject: Re: [PATCH 2/5] net: mvneta: use per_cpu stats to fix an SMP lock up

Hi Eric!

On Sun, Jan 12, 2014 at 10:07:36AM -0800, Eric Dumazet wrote:
> On Sun, 2014-01-12 at 10:31 +0100, Willy Tarreau wrote:
> > Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
> > when they update the stats, and as a result, it randomly happens that
> > the stats freeze on SMP if two updates happen during stats retrieval.
> 
> Your patch is OK, but I dont understand how this freeze can happen.
> 
> TX and RX uses a separate syncp, and TX is protected by a lock, RX
> is protected by NAPI bit.

But we can have multiple tx in parallel, one per queue. And it's only
when I explicitly bind two servers to two distinct CPU cores that I
can trigger the issue, which seems to confirm that this is the cause
of the issue.

> Stats retrieval uses the appropriate BH disable before the fetches...

>From the numerous printks I have added inside the syncp blocks, it
appears that the stats themselves are not responsible for the issue,
but the concurrent Tx are. I ended up several times stuck if I had
two Tx on different CPUs right before a stats retrieval. From the
info I found on the syncp docs, the caller is responsible for locking
and I don't see where there's any lock here since the syncp are global
and not even per tx queue.

But this stuff is very new to me, I can have missed something. That
said, I'm quite certain that the lock happened within the syncp blocks
and only in this case! At least my reading of the relevant includes
seemed to confirm to me that this hypothesis was valid :-/

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html