[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161215220353.GA15619@amd>
Date: Thu, 15 Dec 2016 23:03:53 +0100
From: Pavel Machek <pavel@....cz>
To: Lino Sanfilippo <LinoSanfilippo@....de>
Cc: bh74.an@...sung.com, ks.giri@...sung.com, vipul.pandya@...sung.com,
peppe.cavallaro@...com, alexandre.torgue@...com,
romieu@...zoreil.com, davem@...emloft.net,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v2 2/2] net: ethernet: stmmac: remove private tx queue
lock
Hi!
> >> The driver uses a private lock for synchronization of the xmit function and
> >> the xmit completion handler, but since the NETIF_F_LLTX flag is not set,
> >> the xmit function is also called with the xmit_lock held.
> >>
> >> On the other hand the completion handler uses the reverse locking order by
> >> first taking the private lock and (in case that the tx queue had been
> >> stopped) then the xmit_lock.
> >>
> >> Improve the locking by removing the private lock and using only the
> >> xmit_lock for synchronization instead.
> >
> > Do you have stmmac hardware to test on?
> >
>
> Unfortunately not (I mentioned that the patch I send was only compile tested in
> the first version but I think I forgot to do so in the last
> version).
:-(.
> > I believe something is very wrong with the locking there. In
> > particular... scheduling the stmmac_tx_timer() function to run often
> > should not do anything bad if locking is correct... but it breaks the
> > driver rather quickly. [Example patch below, needs applying to two
> > places in net-next.]
> >
>
> Do you get this result only after the private lock is removed? Or has this problem
> been there before? And how exactly does the failure look like?
I believe I was getting very similar fun even with the private lock. I
re-applied the private lock, and the result is the same.
Also.. locking does seems to work. I added checks to see if the
stmmac_tx_clean() and stmmac_xmit() run at the same time, and they
don't seem to. So my best guess at the moment is missing cache flush
or mb() somewhere.
Failure looks like this:
root@...abuibui:~# mount /dev/mmcblk0p4 /mnt
o 1000000 > /proc/sys/net/core/wmeroot@...abuibui:~# chroot /mnt
/bin/bash
root@...abuibui:/# mount /proc000 100 30
root@...abuibui:/# #echo 1000000 > /proc/sys/net/core/wmem_default
root@...abuibui:/# cd /data/tmp/udpt
root@...abuibui:/data/tmp/udpt# ifconfig eth0 10.0.0.170 up
[ 18.358072] socfpga-dwmac ff702000.ethernet eth0: IEEE 1588-2008
Advanced Timestamp supported
[ 18.366836] socfpga-dwmac ff702000.ethernet eth0: registered PTP
clock
root@...abuibui:/data/tmp/udpt# ./udp-test raw 10.0.0.6 1234 1000 100
30
Sending 100 packets (1000b each) at an interval of 30ms, expected data
rate:3333333b/s (3373333b/s incl udp overhead)
[ 20.453538] socfpga-dwmac ff702000.ethernet eth0: Link is Up -
100Mbps/Full - flow control rx/tx
[ 20.581826] Link is Up - 100/Full
Sending UDP packet took >10ms: 5205162us
This would lead to a lost frame!
Sending UDP packet took >10ms: 40010us
This would lead to a lost frame!
Sending UDP packet took >10ms: 6366084us
This would lead to a lost frame!
Sending UDP packet took >10ms: 36971us
This would lead to a lost frame!
[ 42.084940] ------------[ cut here ]------------
[ 42.089577] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x254/0x26c
[ 42.097821] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0
timed out
[ 42.104935] Modules linked in:
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)
Powered by blists - more mailing lists