[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1096b57-a03f-4fa2-b61f-7418f2304618@gmail.com>
Date: Mon, 12 Aug 2024 16:15:35 +0300
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Sagi Grimberg <sagi@...mberg.me>, Christoph Hellwig <hch@....de>,
Anna Schumaker <Anna.Schumaker@...app.com>,
Trond Myklebust <trondmy@...nel.org>, linux-nfs@...r.kernel.org,
Boris Pismenny <borisp@...dia.com>, John Fastabend
<john.fastabend@...il.com>, Jakub Kicinski <kuba@...nel.org>,
Maxim Mikityanskiy <maxtram95@...il.com>, David Howells
<dhowells@...hat.com>, Sabrina Dubroca <sd@...asysnail.net>,
Mina Almasry <almasrymina@...gle.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Gal Pressman <gal@...dia.com>,
Networking <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
drort@...dia.com
Subject: Re: [Bug report] NFS patch breaks TLS device-offloaded TX zerocopy
On 11/08/2024 21:33, Sagi Grimberg wrote:
>
>
>
> On 11/08/2024 14:21, Tariq Toukan wrote:
>>
>>
>> On 06/08/2024 13:07, Tariq Toukan wrote:
>>>
>>>
>>> On 06/08/2024 11:09, Sagi Grimberg wrote:
>>>>
>>>>
>>>>
>>>> On 06/08/2024 7:43, Tariq Toukan wrote:
>>>>>
>>>>>
>>>>> On 05/08/2024 14:43, Sagi Grimberg wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/08/2024 13:40, Tariq Toukan wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> A recent patch [1] to 'fs' broke the TX TLS device-offloaded flow
>>>>>>> starting from v6.11-rc1.
>>>>>>>
>>>>>>> The kernel crashes. Different runs result in different kernel
>>>>>>> traces.
>>>>>>> See below [2].
>>>>>>> All of them disappear once patch [1] is reverted.
>>>>>>>
>>>>>>> The issues appears only with "sendfile on and zerocopy on".
>>>>>>> We couldn't repro with "sendfile off", or with "sendfile on and
>>>>>>> zerocopy off".
>>>>>>>
>>>>>>> The repro test is as simple as a repeated client/server
>>>>>>> communication (wrk/nginx), with sendfile on and zc on, and with
>>>>>>> "tls-hw-tx-offload: on".
>>>>>>>
>>>>>>> $ for i in `seq 10`; do wrk -b::2:2:2:3 -t10 -c100 -d15 --timeout
>>>>>>> 5s https://[::2:2:2:2]:20448/16000b.img; done
>>>>>>>
>>>>>>> We can provide more details if needed, to help with the analysis
>>>>>>> and debug.
>>>>>>
>>>>>> Does tls sw (i.e. no offload) also break?
>>>>>>
>>>>>
>>>>> No it doesn't.
>>>>> Only the "sendfile with ZC" flow of the TX device-offloaded TLS.
>>>>
>>>
>>> Adding Maxim Mikityanskiy, he might have some insights.
>>>
>>>> Not familiar with the TLS offload code, are there any assumptions on
>>>> PAGE_SIZE contig buffers? Or assumptions on individual
>>>> page references/lifetime?
>>>>
>>>> The sporadic panics you reported look like a result of memory
>>>> corruption or use-after-free conditions.
>>
>> You can find the original patch that implements it here:
>> c1318b39c7d3 tls: Add opt-in zerocopy mode of sendfile()
>>
>> In this flow (sendfile + ZC), page is shared for kernel and userspace,
>> and the extra copy is skipped.
>>
>> There were a few code changes in this area since the feature was
>> introduced.
>> Adding relevant ppl, including David Howells <dhowells@...hat.com>,
>> who removed the sendpage() routine and added MSG_SPLICE_PAGES support
>> to tls_device.
>
> Tariq,
>
> Can you explain where in your test is NFS used? Is the nginx server runs
> on an NFS mount?
I checked with the team.
The requested file, as well as the wrk and nginx apps, all reside on an
NFS mount.
Powered by blists - more mailing lists