netdev - Re: Realtek 8139 problem on 486.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49f40dd8-da68-f579-b359-7a7e229565e1@gmail.com>
Date:   Tue, 1 Jun 2021 00:30:19 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Nikolai Zhubr <zhubr.2@...il.com>, Arnd Bergmann <arnd@...nel.org>
Cc:     netdev <netdev@...r.kernel.org>, Jeff Garzik <jgarzik@...ox.com>
Subject: Re: Realtek 8139 problem on 486.

On 01.06.2021 00:18, Nikolai Zhubr wrote:
> Hi all,
> 
> Some more results follow. I'll report on all suggestions here in one go for brevity.
> 
>> One possible issue is that the "RTL_W16 (IntrStatus, TxErr)" can
>> leak out of the spinlock unless it is changed to RTL_W16_F(), but
>> I don't see how that would cause your problem. This is probably
>> not the issue here, but it can't hurt to change that. Similarly,
>> the "RTL_W16 (IntrStatus, ackstat)" would need the same _F
>> to ensure that a  normal TX-only interrupt gets acked before the
>> spinlock.
> 
> Just tested with "_F" added to all of them, did not help.
> 
>> Another observation I have is that the loop used to be around
>> "RTL_R16(IntrStatus); rtl8139_rx(); rtl8139_tx_interrupt()", so
>> removing the loop also means that the tx handler is only called
>> once when it used to be called for every loop iteration.
>> If this is what triggers the problem, you should be able to break
>> it the same way by moving the rtl8139_tx_interrupt() ahead of the
>> loop, and adjusting the RTL_W16 (IntrStatus, ackstat) accordingly
>> so you only Ack the TX before calling rtl8139_tx_interrupt().
> 
> I get the idea in general, but not sure how exactly you proposed to move rtl8139_tx_interrupt() and adjust the RTL_W16 (IntrStatus, ackstat).
> But meanwhile, I tried a dumb thing instead, and it worked!
> I've put back The Loop:
> ---------------------------
> +       int boguscnt = 20;
> 
>         spin_lock (&tp->lock);
> +       do {
>         status = RTL_R16 (IntrStatus);
> 
>         /* shared irq? */
> @@ -2181,6 +2183,8 @@
>                 if (status & TxErr)
>                         RTL_W16 (IntrStatus, TxErr);
>         }
> +       boguscnt--;
> +       } while (boguscnt > 0);
>   out:
> ---------------------------
> With this added, connection works fine again. Of course it is silly, but hopefully it gives a path for a real fix.
> 

What was discussed here 16 yrs ago should sound familiar to you.
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg92234.html
"It was an option in my BIOS PCI level/edge settings as I posted."
You could check whether you have same/similar option in your BIOS
and play with it.


>> What's your qdisc? Recently there was a bug related to the lockless
>> pfifo_fast qdisc
> 
> If I understand correctly this means packet scheduler type. In more recent kernels I typically have CONFIG_DEFAULT_NET_SCH="fq_codel", now in 2.6.3 no explicite scheduler is enabled, so it must be some fast fifo. But as the sympthoms were basically identical in e.g. 2.6.3 and 4.14, I suppose it is unlikely to be the cause.
> 
>> Issue could be related to rx and tx processing now potentially running in parallel.
>> I only have access to the current 8139too source code, hopefully the following
>> works on the old version:
>>
>> In the end of rtl8139_start_xmit() there's
>> if ((tp->cur_tx - NUM_TX_DESC) == tp->dirty_tx)
>>         netif_stop_queue (dev);
>>
>> Try changing this to
> 
> Ok, the changes compiled fine, but unfortunately made no noticable difference.
> 
> 
> Thank you,
> 
> Regards,
> Nikolai
> 
>