[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1cee1c3e-e6b9-485a-a4d4-c336072f14c3@oracle.com>
Date: Thu, 13 Nov 2025 16:23:52 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Salvatore Bonaccorso <carnil@...ian.org>
Cc: "Tyler W. Ross" <TWR@...erwross.com>,
"1120598@...s.debian.org" <1120598@...s.debian.org>,
Jeff Layton <jlayton@...nel.org>, NeilBrown <neil@...wn.name>,
Scott Mayhew <smayhew@...hat.com>, Steve Dickson <steved@...hat.com>,
Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
Tom Talpey <tom@...pey.com>, Trond Myklebust <trondmy@...nel.org>,
Anna Schumaker <anna@...nel.org>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5
NFSv4 client using SHA2
On 11/13/25 4:21 PM, Salvatore Bonaccorso wrote:
> Hi Chuck,
>
> On Thu, Nov 13, 2025 at 12:47:23PM -0500, Chuck Lever wrote:
>> On 11/13/25 12:16 PM, Tyler W. Ross wrote:
>>> Thanks, Chunk.
>>>
>>> Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
>>>
>>> <idle>-0 [001] ..s2. 270.327040: xs_data_ready: peer=[10.108.2.102]:2049
>>> kworker/u16:0-12 [001] ...1. 270.327048: xprt_lookup_rqst: peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
>>> kworker/u16:0-12 [001] ...2. 270.327050: rpc_task_wakeup: task:00000008@...00005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
>>> kworker/u16:0-12 [001] ..... 270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
>>> kworker/u16:0-12 [001] ..... 270.327055: xs_stream_read_data: peer=[10.108.2.102]:2049 err=-11 total=992
>>> ls-969 [003] ..... 270.327062: rpc_task_sync_wake: task:00000008@...00005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>>> ls-969 [003] ..... 270.327062: rpc_task_run_action: task:00000008@...00005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
>>> ls-969 [003] ..... 270.327063: rpc_task_run_action: task:00000008@...00005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>>> ls-969 [003] ..... 270.327063: rpc_task_run_action: task:00000008@...00005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
>>> ls-969 [003] ..... 270.327063: rpc_xdr_recvfrom: task:00000008@...00005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
>>> ls-969 [003] ..... 270.327067: rpc_xdr_overflow: task:00000008@...00005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
>>
>> Here's the problem. This is a sign of an XDR decoding issue. If you
>> capture the traffic with Wireshark, does Wireshark indicate where the
>> XDR is malformed?
>>
>> If it doesn't, then there is some problem with the client code. Since
>> Fedora 43 is working as expected, I would guess there's a misapplied
>> patch on Debian 13's kernel...?
>
> if it is helpful: Debian follows the stable upstream releases (6.12.y
> for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
> to keep the patches limited which we apply on top. So far I see none
> which touches net/sunrpc/. The patches applied:
> https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/debian/patches?ref_type=heads
> (in case this could help narrowing down more the issue).
>
> But we could try here additionally, if Tylor has the possibility to do
> so, to try directly the 6.17.7 upstream version without Debian patches
> applied.
A bisect between broken v6.12.y and working v6.17.7 could identify
what is possibly missing from v6.12.y.
--
Chuck Lever
Powered by blists - more mailing lists