lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b11a3198-2d78-c90a-9586-f4752ae4fe6a@badula.org>
Date:   Wed, 30 Sep 2020 14:49:41 -0400
From:   Ion Badulescu <ionut@...ula.org>
To:     stable@...r.kernel.org, netdev@...r.kernel.org
Subject: Network packet reordering with the 5.4 stable tree

[As suggested by Greg K-H, I'm reposting this to both stable and netdev. 
I'm also including a backported patch for the 5.4-stable tree, which is 
now confirmed to be working and fixing the packet reordering issue I was 
seeing.]

Hello,

I ran into some packet reordering using a plain vanilla 5.4.49 kernel 
and the Amazon AWS ena driver. The external symptom was that every now 
and again, one or more larger packets would be delivered to the UDP 
socket after some smaller packets, even though the smaller packets were 
sent after the larger packets. They were also delivered late to a packet 
socket sniffer, which initially made me suspect an RSS bug in the ena 
hardware. Eventually, I modified the ena driver to stamp each packet (by 
overwriting its ethernet source mac field) with the id of the RSS queue 
that delivered it, and with the value of a per-RSS-queue counter that 
incremented for each packet received in that queue. That hack showed RSS 
functioning properly, and also showed packets received earlier (with a 
smaller counter value) being delivered after packets with a larger 
counter value. It established that the reordering was in fact happening 
inside the kernel network core.

The breakthrough came from realizing that the ena driver defaults its 
rx_copybreak to 256, which matched perfectly the boundary between the 
delayed large packets and the smaller packets being delivered first. 
After changing ena's rx_copybreak to zero, the reordering issue disappeared.

After a lot of hair pulling, I think I figured out what the issue is -- 
and it's confined to the 5.4 stable tree. Commit 323ebb61e32b4 (present 
in 5.4) introduced queueing for GRO_NORMAL packets received via 
napi_gro_frags() -> napi_frags_finish(). Commit 6570bc79c0df (NOT 
present in 5.4) extended the same GRO_NORMAL queueing to packets 
received via napi_gro_receive() -> napi_skb_finish(). Without 
6570bc79c0df, packets received via napi_gro_receive() can get delivered 
ahead of the earlier, queued up packets received via napi_gro_frags(). 
And this is precisely what happens in the ena driver with packets 
smaller than rx_copybreak vs packets larger than rx_copybreak.

Interestingly, the 5.4 stable tree does contain a backport of the 
upstream c80794323e commit, which to fixes packet reordering issues 
introduced by 323ebb61e32b4 and 6570bc79c0df. But 6570bc79c0df itself is 
missing, which creates another avenue for packet reordering.

The patch I'm attaching is a backport of 6570bc79c0df to the 5.4 stable 
tree. It is confirmed to completely eliminate the packet reordering 
previously seen with the ena driver and rx_copybreak=256.

Thanks,
-Ion

View attachment "pkt-reord-fix-5.4.x.diff" of type "text/x-patch" (2433 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ