lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <918e1422f872d1277d28cf2a29e4dac6@i4031.net>
Date:	Fri, 06 May 2016 16:45:16 -0400
From:	"David B. Robins" <linux@...idrobins.net>
To:	Dean Jenkins <Dean_Jenkins@...tor.com>
Cc:	John Stultz <john.stultz@...aro.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Mark Craske <Mark_Craske@...tor.com>,
	"David S. Miller" <davem@...emloft.net>,
	YongQin Liu <yongqin.liu@...aro.org>,
	Guodong Xu <guodong.xu@...aro.org>, linux-usb@...r.kernel.org,
	netdev@...r.kernel.org, Ivan Vecera <ivecera@...hat.com>
Subject: Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

On 2016-05-04 03:58, Dean Jenkins wrote:
> On 04/05/16 01:28, David B. Robins wrote:
>> 
>>> 
>>> Here is the code snippet from the patch with my annotations between #
>>> #, I will try to explain my intentions. Feel free to point out any
>>> flaws:
>>> 
>>>     if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
>>>         # Only runs when rx->remaining !=0 and the end of the 
>>> Ethernet
>>> frame + next 32-bit header word is within the URB buffer. #
>>>         # Therefore, this code does not run when the end of an
>>> Ethernet frame has been reached in the previous URB #
>>>         # or when the end of the Ethernet frame + next 32-bit header
>>> word will be in a later URB buffer #
>> 
>> It may well be. I don't have the setup with me now, but I can try 
>> tomorrow to reproduce an environment where I can add some more 
>> detailed logging.
>> 
>> Since the URB length has to be >= than the remaining data plus a u32, 
>> the devices that John Stultz and I are using (AX88772B in my case) may 
>> be adding some additional data/padding after an Ethernet frame, 
>> expecting it to be discarded, and running into this check and its 
>> consequences. This may mean the device is badly behaved, if it is 
>> specified not to send anything extra; in any case, a well-intentioned 
>> error correction has gone badly, but I better understand the intent 
>> now. I am curious to know how often the device you are using benefits 
>> from this block of code.
> 
> The issue is that the driver should be robust to cope with missing
> URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet
> adaptor in our ARM embedded system which runs in hostile environments,
> it was noticed that URBs could be lost (probably due to a bug
> elsewhere or low memory issue). Without this patch, a missing URB
> causes bad Ethernet frames to be passed up to the IP stack because
> rx->remaining spans multiple URBs.
> 
> In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is
> processed and copies the 1st part of the Ethernet frame into the
> netdev buffer, for the 2nd URB the remaining part of the Ethernet
> frame is copied into the same netdev buffer to complete the Ethernet
> frame. The netdev buffer is then sent up to the IP stack.
> 
> In the case of a missing URB, a bad Ethernet frame is created as 
> follows:
> The 1st URB is processed and copies the 1st part of the Ethernet frame
> into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is
> processed and blindly copies what it thinks is the remaining part of
> the Ethernet frame in the same netdev buffer which corrupts the
> Ethernet frame. The netdev buffer is then sent up to the IP stack. The
> 3rd URB and subsequent URBs are processed but synchronisation has been
> lost so can misread data as a 32-bit header word. It is likely that
> some good Ethernet frames get discarded whilst trying to
> resynchronise.
> 
> A recovery strategy for regaining lock with the 32-bit header word is
> necessary otherwise the driver will have difficulty in recovering from
> a lost URB.
> 
> In the "olden days", the 32-bit header word was always at the start of
> the URB buffer so previous URBs did not influence the current URB. So
> no recovery strategy was needed at that time. But now we have to
> remember what happened in the previous URB and a lost URB can cause a
> discontinuity in the data stream because the data is not always
> aligned to the start of the URB buffer.
> 
> I agree that your environment may never suffer from lost URBs so
> removal of the patch would work OK.
> 
> I will try to find some time to setup a test environment.

I got the chance to add some more logging (and add back the header 
synchronization code) to a system which is receiving video data on an 
8-port POE device using AX88772Bs - it replaces the "Data header 
synchronization was lost" log and logs a few more local items ("u32" = 
rx->header):

[3105478.073908] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3105478.073918] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xf05b7c5f, offset 4
[3105478.074022] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074032] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xffaae2a5, offset 4
[3105478.074146] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518
[3105478.074280] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074291] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xe1478400, offset 4
[3105478.074399] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 916
[3105478.074409] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0x9fa84eb5, offset 4
[3105478.074657] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518

Grepping for just the SYNC_LOSS messages shows a similar pattern to that 
seen elsewhere in the thread:

[3104761.056033] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.058035] asix 1-1.4.1.3:1.0 poe2: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.076795] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.107904] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.114649] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.121152] asix 1-1.4.1.1:1.0 poe1: asix_rx_fixup() SYNC_LOSS 
remain 988 len 1518 offset 992 u32 0xecf95000
[3104761.147784] asix 1-1.4.1.3:1.0 poe2: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.160535] asix 1-1.4.1.1:1.0 poe1: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.168024] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.168403] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.170654] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.180418] asix 1-1.4.1.1:1.0 poe1: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.221783] asix 1-1.4.1.1:1.0 poe1: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.248784] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.252421] asix 1-1.4.1.2:1.0 poe5: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.262903] asix 1-1.4.2.4:1.0 poe8: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.266904] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.267281] asix 1-1.4.2.4:1.0 poe8: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.271665] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.273915] asix 1-1.4.2.4:1.0 poe8: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.280048] asix 1-1.4.2.4:1.0 poe8: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.287152] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.292157] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.295054] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.310915] asix 1-1.4.1.1:1.0 poe1: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2022 offset 992 u32 0xecf95000
[3104761.316918] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.324935] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.326665] asix 1-1.4.2.3:1.0 poe4: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.336420] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000

This is an Ubuntu 14.04.2 LTS i686 (32-bit) system running kernel 
3.13.0-53-generic. I don't see any "low-speed" warnings, and lsusb -t 
shows each device (and their hubs) running at 480M.

ifconfig poe1:

poe1      Link encap:Ethernet  HWaddr 00:50:f9:ec:4f:d8
           inet addr:192.168.201.1  Bcast:192.168.201.255  
Mask:255.255.255.0
           inet6 addr: fe80::250:f9ff:feec:4fd8/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:194441 errors:16649 dropped:15 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:267310738 (267.3 MB)  TX bytes:10300282 (10.3 MB)

Without the "sanity test/synchronization" code:

poe1      Link encap:Ethernet  HWaddr 00:50:f9:ec:4f:d8
           inet addr:192.168.201.1  Bcast:192.168.201.255  
Mask:255.255.255.0
           inet6 addr: fe80::250:f9ff:feec:4fd8/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:398910 errors:1 dropped:14 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:513363997 (513.3 MB)  TX bytes:13154142 (13.1 MB)

> 
> Regards,
> Dean

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ