netdev - Re: Realtek 8139 problem on 486.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <60B560A8.8000800@gmail.com>
Date:   Tue, 01 Jun 2021 01:18:16 +0300
From:   Nikolai Zhubr <zhubr.2@...il.com>
To:     Arnd Bergmann <arnd@...nel.org>
CC:     netdev <netdev@...r.kernel.org>, Jeff Garzik <jgarzik@...ox.com>
Subject: Re: Realtek 8139 problem on 486.

Hi all,

Some more results follow. I'll report on all suggestions here in one go 
for brevity.

> One possible issue is that the "RTL_W16 (IntrStatus, TxErr)" can
> leak out of the spinlock unless it is changed to RTL_W16_F(), but
> I don't see how that would cause your problem. This is probably
> not the issue here, but it can't hurt to change that. Similarly,
> the "RTL_W16 (IntrStatus, ackstat)" would need the same _F
> to ensure that a  normal TX-only interrupt gets acked before the
> spinlock.

Just tested with "_F" added to all of them, did not help.

> Another observation I have is that the loop used to be around
> "RTL_R16(IntrStatus); rtl8139_rx(); rtl8139_tx_interrupt()", so
> removing the loop also means that the tx handler is only called
> once when it used to be called for every loop iteration.
> If this is what triggers the problem, you should be able to break
> it the same way by moving the rtl8139_tx_interrupt() ahead of the
> loop, and adjusting the RTL_W16 (IntrStatus, ackstat) accordingly
> so you only Ack the TX before calling rtl8139_tx_interrupt().

I get the idea in general, but not sure how exactly you proposed to move 
rtl8139_tx_interrupt() and adjust the RTL_W16 (IntrStatus, ackstat).
But meanwhile, I tried a dumb thing instead, and it worked!
I've put back The Loop:
---------------------------
+       int boguscnt = 20;

         spin_lock (&tp->lock);
+       do {
         status = RTL_R16 (IntrStatus);

         /* shared irq? */
@@ -2181,6 +2183,8 @@
                 if (status & TxErr)
                         RTL_W16 (IntrStatus, TxErr);
         }
+       boguscnt--;
+       } while (boguscnt > 0);
   out:
---------------------------
With this added, connection works fine again. Of course it is silly, but 
hopefully it gives a path for a real fix.

> What's your qdisc? Recently there was a bug related to the lockless
> pfifo_fast qdisc

If I understand correctly this means packet scheduler type. In more 
recent kernels I typically have CONFIG_DEFAULT_NET_SCH="fq_codel", now 
in 2.6.3 no explicite scheduler is enabled, so it must be some fast 
fifo. But as the sympthoms were basically identical in e.g. 2.6.3 and 
4.14, I suppose it is unlikely to be the cause.

> Issue could be related to rx and tx processing now potentially running in parallel.
> I only have access to the current 8139too source code, hopefully the following
> works on the old version:
>
> In the end of rtl8139_start_xmit() there's
> if ((tp->cur_tx - NUM_TX_DESC) == tp->dirty_tx)
> 		netif_stop_queue (dev);
>
> Try changing this to

Ok, the changes compiled fine, but unfortunately made no noticable 
difference.


Thank you,

Regards,
Nikolai