[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGc7n2vv6vGh7j0Y=7DNqfXnQxZaTcwdPD15kzoY1in08Q@mail.gmail.com>
Date: Sun, 12 Jan 2025 09:49:47 -0800
From: Maciej Żenczykowski <maze@...gle.com>
To: Junzhong Pan <panjunzhong@...look.com>
Cc: quic_kriskura@...cinc.com, gregkh@...uxfoundation.org,
linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
quic_jackp@...cinc.com, quic_ppratap@...cinc.com, quic_wcheng@...cinc.com,
stable@...r.kernel.org
Subject: Re: [PATCH v3] usb: gadget: ncm: Avoid dropping datagrams of properly
parsed NTBs
On Sat, Jan 11, 2025 at 2:31 AM Junzhong Pan <panjunzhong@...look.com> wrote:
>
> Hi Maciej
>
> Thanks for the reply, I am sorry for the unclear description.
>
> +remove hgajjar@...adit-jv.com from CC list due to mail delivery failure.
>
> On 2025/1/11 3:00, Maciej Żenczykowski wrote:
> > On Thu, Jan 9, 2025 at 11:37 PM Junzhong Pan <panjunzhong@...look.com> wrote:
> >> Lecroy shows the wSequence=5765 have 10 Datagram consisting a 31*512
> >> bytes=15872 bytes OUT Transfer but have no ZLP:
> >>
> >> OUT Transfer wSequence=5765
> >> NTH32 Datagrams: 1514B * 8 + 580B NDP32
> >> Transfer length: 512B * 31
> >> NO ZLP
> >> OUT Transfer wSequence=5766
> >> NTH32 Datagrams: 1514B * 8 NDP32
> >> Transfer length: 512B * 29 + 432
> >>
> >> This lead to a result that the first givebacked 16K skb correponding to
> >> a usb_request contains two NTH but not complete:
> >>
> >> USB Request 1 SKB 16384B
> >> (NTH32) (Datagrams) (NDP32) | (NTH32) (Datagrams piece of wSequence=5766)
> >> USB Request 2 SKB 14768B
> >> (Datagrams piece of wSequence=5766) (NDP32)
> >>
> >> From the context, it seems the first report of Wrong NDP SIGN is caused
> >> by out-of-bound accessing, the second report of Wrong NTH SIGN is caused
> >> by a wrong beginning of NTB parsing.
> >
> > Could you clarify which Linux Kernel you're testing against?
>
> I am using linux 6.6.63, but the related codes have not newer updates since.
>
> > Could you provide some pcap of the actual usb frames?
> > Or perhaps describe better the problem, because I'm not quite
> > following from your email.
> > (I'm not sure if the problem is what windows is sending, or how Linux
> > is parsing it)
>
> I think the root cause of the problem is because Windows 10 ncm class
> driver doesn't send ZLP, meanwhile the current parsing is depend on a condition
> that NTB won't spread across usb_request.
>
> This is observed only on Windows 10 but not on Windows 11.
>
> To reproduce the issue, you just need a Windows 10 machine and run iperf3 -c
> from the windows and iperf3 -s on the gadget board, configure the gadget
> with the following os_desc, windows10 will bind its ncm driver automatically:
> echo 0xEF > $GADGET/bDeviceClass
> echo 0x02 > $GADGET/bDeviceSubClass
> echo 0x01 > $GADGET/bDeviceProtocol
> echo 1 > $GADGET/os_desc/use
> echo 0x1 > $GADGET/os_desc/b_vendor_code
> echo "MSFT100" > $GADGET/os_desc/qw_sign
> mkdir -p $FUNCTIONS/ncm.0/os_desc/interface.ncm
> echo WINNCM > $FUNCTIONS/ncm.0/os_desc/interface.ncm/compatible_id
>
> Reproduced on dwc2 and dwc3 controller with linux 6.6.63.
>
> > I *think* what you're saying is that wSequence=5765 & 5766 are being
> > treated as a single ncm message due to their being a multiple of 512
> > in the former, not followed by a ZLP? I thought that was precisely
> > when microsoft ncm added an extra zero byte...
> >
> > What's at the end of 5755? Padding? No padding?
> > Is there an NTH32 header in 5766? Should there be?
> >
>
> Okay, I will explain it more precisely (some numbers are corrected).
> On the USB wire, the transfers on OUT endpoint is like this:
>
> OUT Transfer #1 issued by win10, Transfer length: 512B * 31 = 15872 Bytes
> OUT Transaction 512B for 31 times, no ZLP and other things. Parsed as below:
> [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90
> [Offset 0x0012] Datagram blocks
> [Offset 0x3D90] A NDP32 describing 11 Datagram blocks plus one zero length item and padding, len = 112 Bytes.
> (0)index=0x0012 length=1514
> (1)index=0x05FE length=1514
> (2)index=0x0BEA length=1514
> (3)index=0x11D6 length=1514
> (4)index=0x17C2 length=1514
> (5)index=0x1DAE length=1514
> (6)index=0x239A length=1514
> (7)index=0x2986 length=1514
> (8)index=...... length=1514
> (9)index=...... length=1514
> (10)index=..... length=580
> (11)index=0x0 length=0
> ...padding...
>
> OUT Transfer #2 issued by win10, Transfer length: 512B * 29 + 432 = 15280 Bytes
> OUT Transaction 512B for 29 times, and OUT Transaction 432B for one time.
> [Offset 0x0000] A NTH32 with wSequence=5766, dwNdpIndex=0x3B48
> [Offset 0x0012] Datagram block
> [Offset 0x3B48] A NDP32 describing Datagram blocks
>
> In the u_ether driver, we first queued a usb_request(buf=skb.data, length=16384)
> to the udc driver, the udc should give it back when it receive a 16384B data or
> encounter ZLP/Short packet.
>
> Since the OUT Xfer #1 doesn't have ZLP or SP, the udc won't giveback the usb_request
> with req->actual = 15872, instead, it gather some piece of the data from OUT Transfer #2
> to make a req->actual=16384 then return to rx_complete.
>
> For f_ncm, we now have a the first usb_request givebacked from udc driver having
> skb->data organized as the following structure:
>
> usb_request #1
> -----------------------from OUT Transfer #1:-------------------
> [Offset 0x0000] A NTH32 Header with wSequence=5765, dwNdpIndex=0x3D90
> [Offset 0x0012] Datagram blocks
> [Offset 0x3D90] A NDP32 describing 10 Datagram blocks plus one zero length item and padding, len = 112 Bytes.
> (0)index=0x0012 length=1514
> (1)index=0x05FE length=1514
> (2)index=0x0BEA length=1514
> (3)index=0x11D6 length=1514
> (4)index=0x17C2 length=1514
> (5)index=0x1DAE length=1514
> (6)index=0x239A length=1514
> (7)index=0x2986 length=1514
> (8)index=...... length=1514
> (9)index=...... length=1514
> (10)index=..... length=580
> (11)index=0x0 length=0
> ...padding...
> --------------------from OUT Transfer #2:-------------------
> [Offset 0x3E00/15872] A NTH32 Header with wSequence=5766, dwNdpIndex=0x3B48
> [Offset 0x3E00+???] Datagram block piece from next NDP (wSequence=5766)
>
> During the unwrap, we will first report a "Wrong NTH SIGN" when we try to redirect to
> the NDP of wSequence=5765 parsing the second NTB in the usb_request, since theie is no
> bound checking, we read some garbage data from skb->data + [0x3E00 + 0x3B48].
>
> Raising the following message (without modification):
> [ 174] configfs-gadget.0: Wrong NDP SIGN
>
> After this, we go to the next usb_request, at this time, we have a short package(432B) from
> the OUT Transfer #2, thus the udc would giveback next usb_request to us like this:
>
> usb_request #2
> --------------------from OUT Transfer #2:-------------------
> [Offset 0x0000] Datagram block piece from NTB of wSequence=5765
> 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
> [Offset 0x????] A NDP32 describing Datagram blocks
>
> At this point, the unwrap function try to read a NTH sign but got unexpected user data,
> then it raise this message:
> [ 174] configfs-gadget.0: Wrong NTH SIGN, skblen 14768
> [ 174] HEAD:00000000b1a72bfc: 3f 98 a6 8e 17 f8 bb 29 07 b8 da 13 7f 20 80 8e 77 ca 32 07 ac 71 b8 8d 84 03 d7 1b 96 9b c4 fa
>
> In short, there is two NTBs across two usb_requests, the beginning of a usb_request is
> not necessary a NTH sign.
> Do this make sense to you?
>
> Best Regards,
> Pan
Ok, this was *very* helpful. I'll dig into this more during the week,
but here's a few quick questions/statements that immediately come to
mind.
(a) I think this looks like a bug on the sending Win10 side, rather
than a parsing bug in Linux,
with there being no ZLP, and no short (<512) frame, there's simply no
way for the receiver to split at the right spot.
Indeed, fixing this on the Linux/parsing side seems non-trivial...
I guess we could try to treat the connection as simply a serial
connection (ie. ignore frame boundaries), but then we might have
issues with other senders...
I guess the most likely 'correct' hack/fix would be to hold on to the
extra 'N*512' bytes (it doesn't even have to be 1, though likely the N
is odd), if it starts with a NTH header...
(b) I notice the '512' not '1024', I think this implies a USB2
connection instead of USB3
-- could you try replicating this with a USB3 capable data cable (and
USB3 ports), this should result in 1024 block size instead of 512.
I'm wondering if the win10 stack is avoiding generating N*1024, but
then hitting N*512 with odd N...
Presumably '512' would be '64' with USB1.0/1.1, but I guess finding a
USB1.x port/host to test against is likely to be near impossible...
I'll try to see if I can find the source of the bug in the Win
driver's sources (though based on it being Win10 only, may need to
search history)
--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Powered by blists - more mailing lists