lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 31 May 2021 21:05:14 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Nikolai Zhubr <zhubr.2@...il.com>, netdev <netdev@...r.kernel.org>
Cc:     Jeff Garzik <jgarzik@...ox.com>
Subject: Re: Realtek 8139 problem on 486.

On 31.05.2021 18:53, Nikolai Zhubr wrote:
> Hi all,
> 
> 31.05.2021 2:17, Nikolai Zhubr:
> [...]
>> However indeed, it seems a problem was introduced with a rework of
>> interrupt handling (rtl8139_interrupt) in 2.6.3, because I have already
>> pushed all other differences from 2.6.3 to 2.6.2 and it still keeps
>> working fine.
>> My resulting minimized diff is still ~300 lines, it is too big and
>> complicated to be usefull to post here as is.
> 
> Some more input.
> 
> I was able to minimize the problematic diff to basically one screenfull, it is quite comprehencable now, and I'm including it below. It is the change in status/event handling due to a switch to NAPI that intruduced the problem.
> Now, in some more detailed tests, I observe that _receiving_ still works fine. It is _sending_ that suffers, and apparently, only when trying to send a lot at a time. In such case I see these warnings:
> 
> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
> 
> It looks like the queue of tx frames somehow gets messed up.
> 
> The essential diff fragment:
> ================================================
>      dev->open = rtl8139_open;
>      dev->hard_start_xmit = rtl8139_start_xmit;
> +    dev->poll = rtl8139_poll;
>      dev->weight = 64;
>      dev->stop = rtl8139_close;
>      dev->get_stats = rtl8139_get_stats;
> @@ -2015,7 +2010,7 @@
>              tp->stats.rx_bytes += pkt_size;
>              tp->stats.rx_packets++;
> 
> -            netif_rx (skb);
> +            netif_receive_skb (skb);
>          } else {
>              if (net_ratelimit())
>                  printk (KERN_WARNING
> @@ -2138,10 +2133,8 @@
>      u16 status, ackstat;
>      int link_changed = 0; /* avoid bogus "uninit" warning */
>      int handled = 0;
> -    int boguscnt = max_interrupt_work;
> 
>      spin_lock (&tp->lock);
> -    do {
>      status = RTL_R16 (IntrStatus);
> 
>      /* shared irq? */
> @@ -2169,8 +2162,14 @@
>      if (ackstat)
>          RTL_W16 (IntrStatus, ackstat);
> 
> -    if (netif_running (dev) && (status & RxAckBits))
> -        rtl8139_rx (dev, tp, 1000000000);
> +    /* Receive packets are processed by poll routine.
> +       If not running start it now. */
> +    if (status & RxAckBits){
> +        if (netif_rx_schedule_prep(dev)) {
> +            RTL_W16_F (IntrMask, rtl8139_norx_intr_mask);
> +            __netif_rx_schedule (dev);
> +        }
> +    }
> 
>      /* Check uncommon events with one test. */
>      if (unlikely(status & (PCIErr | PCSTimeout | RxUnderrun | RxErr)))
> @@ -2182,16 +2181,6 @@
>          if (status & TxErr)
>              RTL_W16 (IntrStatus, TxErr);
>      }
> -    boguscnt--;
> -    } while (boguscnt > 0);
> -
> ================================================
> 
> 
> Thank you,
> 
> Regards,
> Nikolai
> 
> 
> 
>>
>>
Issue could be related to rx and tx processing now potentially running in parallel.
I only have access to the current 8139too source code, hopefully the following
works on the old version:

In the end of rtl8139_start_xmit() there's
if ((tp->cur_tx - NUM_TX_DESC) == tp->dirty_tx)
		netif_stop_queue (dev);

Try changing this to

if (tp->cur_tx - NUM_TX_DESC == tp->dirty_tx) {
	smp_wmb();
	netif_stop_queue (dev);
	smp_mb__after_atomic();       /* if this doesn't exist yet, use mb() */
	if (tp->cur_tx - NUM_TX_DESC != tp->dirty_tx)
		netif_start_queue(dev);
}


And at the end of rtl8139_tx_interrupt() change

	if (tp->dirty_tx != dirty_tx) {
		tp->dirty_tx = dirty_tx;
		mb();
		netif_wake_queue (dev);
	}

to

	if (tp->dirty_tx != dirty_tx) {
		tp->dirty_tx = dirty_tx;
		mb();
		if (netif_queue_stopped(dev) && tp->cur_tx - NUM_TX_DESC != tp->dirty_tx)
			netif_wake_queue (dev);
	}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ