[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a11a0502-4174-48d3-a8ad-8584fd304fe1@grimberg.me>
Date: Sun, 11 Aug 2024 21:33:17 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Tariq Toukan <ttoukan.linux@...il.com>, Christoph Hellwig <hch@....de>,
Anna Schumaker <Anna.Schumaker@...app.com>,
Trond Myklebust <trondmy@...nel.org>, linux-nfs@...r.kernel.org,
Boris Pismenny <borisp@...dia.com>, John Fastabend
<john.fastabend@...il.com>, Jakub Kicinski <kuba@...nel.org>,
Maxim Mikityanskiy <maxtram95@...il.com>, David Howells
<dhowells@...hat.com>, Sabrina Dubroca <sd@...asysnail.net>,
Mina Almasry <almasrymina@...gle.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Gal Pressman <gal@...dia.com>,
Networking <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>
Subject: Re: [Bug report] NFS patch breaks TLS device-offloaded TX zerocopy
On 11/08/2024 14:21, Tariq Toukan wrote:
>
>
> On 06/08/2024 13:07, Tariq Toukan wrote:
>>
>>
>> On 06/08/2024 11:09, Sagi Grimberg wrote:
>>>
>>>
>>>
>>> On 06/08/2024 7:43, Tariq Toukan wrote:
>>>>
>>>>
>>>> On 05/08/2024 14:43, Sagi Grimberg wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 05/08/2024 13:40, Tariq Toukan wrote:
>>>>>> Hi,
>>>>>>
>>>>>> A recent patch [1] to 'fs' broke the TX TLS device-offloaded flow
>>>>>> starting from v6.11-rc1.
>>>>>>
>>>>>> The kernel crashes. Different runs result in different kernel
>>>>>> traces.
>>>>>> See below [2].
>>>>>> All of them disappear once patch [1] is reverted.
>>>>>>
>>>>>> The issues appears only with "sendfile on and zerocopy on".
>>>>>> We couldn't repro with "sendfile off", or with "sendfile on and
>>>>>> zerocopy off".
>>>>>>
>>>>>> The repro test is as simple as a repeated client/server
>>>>>> communication (wrk/nginx), with sendfile on and zc on, and with
>>>>>> "tls-hw-tx-offload: on".
>>>>>>
>>>>>> $ for i in `seq 10`; do wrk -b::2:2:2:3 -t10 -c100 -d15 --timeout
>>>>>> 5s https://[::2:2:2:2]:20448/16000b.img; done
>>>>>>
>>>>>> We can provide more details if needed, to help with the analysis
>>>>>> and debug.
>>>>>
>>>>> Does tls sw (i.e. no offload) also break?
>>>>>
>>>>
>>>> No it doesn't.
>>>> Only the "sendfile with ZC" flow of the TX device-offloaded TLS.
>>>
>>
>> Adding Maxim Mikityanskiy, he might have some insights.
>>
>>> Not familiar with the TLS offload code, are there any assumptions on
>>> PAGE_SIZE contig buffers? Or assumptions on individual
>>> page references/lifetime?
>>>
>>> The sporadic panics you reported look like a result of memory
>>> corruption or use-after-free conditions.
>
> You can find the original patch that implements it here:
> c1318b39c7d3 tls: Add opt-in zerocopy mode of sendfile()
>
> In this flow (sendfile + ZC), page is shared for kernel and userspace,
> and the extra copy is skipped.
>
> There were a few code changes in this area since the feature was
> introduced.
> Adding relevant ppl, including David Howells <dhowells@...hat.com>,
> who removed the sendpage() routine and added MSG_SPLICE_PAGES support
> to tls_device.
Tariq,
Can you explain where in your test is NFS used? Is the nginx server runs
on an NFS mount?
Powered by blists - more mailing lists