linux-kernel - Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRunktdq8sJ7Eecj@aion>
Date: Mon, 17 Nov 2025 17:54:10 -0500
From: Scott Mayhew <smayhew@...hat.com>
To: Trond Myklebust <trondmy@...nel.org>
Cc: Chuck Lever <chuck.lever@...cle.com>, Anna Schumaker <anna@...nel.org>,
	Salvatore Bonaccorso <carnil@...ian.org>,
	"1120598@...s.debian.org" <1120598@...s.debian.org>,
	Jeff Layton <jlayton@...nel.org>, NeilBrown <neil@...wn.name>,
	Steve Dickson <steved@...hat.com>,
	Olga Kornievskaia <okorniev@...hat.com>,
	Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	"Tyler W. Ross" <TWR@...erwross.com>
Subject: Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5
 NFSv4 client using SHA2

On Sun, 16 Nov 2025, Trond Myklebust wrote:

> On Sun, 2025-11-16 at 11:29 -0500, Chuck Lever wrote:
> > On 11/15/25 7:38 PM, Tyler W. Ross wrote:
> > > On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever
> > > <chuck.lever@...cle.com> wrote:
> > > > Then I would say further hunting for the broken commit is going
> > > > to be
> > > > fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next
> > > > step so
> > > > we see which XDR data item (assuming it's the same one every
> > > > time) is
> > > > failing to decode.
> > > 
> > > I added WARNs after each trace_rpc_xdr_overflow() call, and then a
> > > couple
> > > pr_info() inside xdr_copy_to_scratch() as a follow-up.
> > > 
> > > If I'm understanding correctly, it's failing in the
> > > xdr_copy_to_scratch()
> > > call inside xdr_inline_decode(), because the xdr_stream struct has
> > > an
> > > unset/NULL scratch kvec. I don't understand the context enough to
> > > speculate on why, though.
> > > 
> > > [   26.844102] Entered xdr_copy_to_scratch()
> > > [   26.844105] xdr->scratch.iov_base: 0000000000000000
> > > [   26.844107] xdr->scratch.iov_len: 0
> > > [   26.844127] ------------[ cut here ]------------
> > > [   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490
> > > xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > > [   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4
> > > dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr
> > > intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid
> > > irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp
> > > stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss
> > > sunrpc configfs efi_pstore nfnetlink vsock_loopback
> > > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> > > vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache
> > > jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib
> > > drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper
> > > ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy
> > > psmouse net_failover failover scsi_mod serio_raw i2c_piix4
> > > usb_common scsi_common i2c_smbus
> > > [   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted
> > > 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
> > > [   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > > 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > [   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > > [   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36
> > > fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c
> > > 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24
> > > 08 48 29 d0 48
> > > [   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
> > > [   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX:
> > > ffffd09e82ce3838
> > > [   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI:
> > > ffff8f1f37c5ce40
> > > [   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09:
> > > ffffd09e82ce3600
> > > [   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12:
> > > 000000000000001a
> > > [   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15:
> > > 0000000000440000
> > > [   26.844250] FS:  00007fa5d13db240(0000)
> > > GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
> > > [   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4:
> > > 0000000000750ef0
> > > [   26.844255] PKRU: 55555554
> > > [   26.844257] Call Trace:
> > > [   26.844259]  <TASK>
> > > [   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
> > > [   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
> > > [   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
> > > [   26.844311]  call_decode+0x211/0x230 [sunrpc]
> > > [   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
> > > [   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
> > > [   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
> > > [   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
> > > [   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
> > > [   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
> > > [   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
> > > [   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
> > > [   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
> > > [   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
> > > [   26.844506]  iterate_dir+0xaa/0x270
> > 
> > Hi Trond, Anna -
> > 
> > NFSv4 READDIR is hitting an XDR overflow because the XDR stream's
> > scratch buffer is missing, and one of the READDIR response's fields
> > crosses a page boundary in the receive buffer.
> > 
> > Shouldn't the client's readdir XDR decoder have a scratch buffer?
> 
> No it shouldn't.
> 
> The READDIR XDR decoder doesn't interpret the contents of the readdir
> buffer. What it is supposed to do is read the op header and the readdir
> verifier, and then to align the remaining data into the pages that were
> allocated as buffer using a call to xdr_read_page(). Essentially, it's
> the exact same procedure as we follow for a READ call.
> 
> So if we're crossing into the pages before we hit the call to
> xdr_read_pages() then that means we've allocated too small a header
> buffer. Since it only appears to happen with RPCSEC_GSS, then my money
> would be on AUTH_GSS not padding the reply buffer sufficiently when
> setting the value of auth->au_cslack.

If replies are the problem, why wouldn't we want to focus on
auth->au_rslack and auth->au_ralign?

FWIW I have both Debian Trixie and Sid/Forky VMs, and krb5{,i,p} is
working across the board for me.  Normally I just use a plain MIT KDC,
so I tried IPA and that works fine too.  Looking Tyler's tracepoint
output, these two jump out:

              ls-969   [003] .....   270.326933: rpc_buf_alloc:        task:00000008@...00005 callsize=3932 recvsize=176 status=0
                                                                                                                     ^^^
              ls-969   [003] .....   270.326936: rpc_xdr_reply_pages:  task:00000008@...00005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=0
                                                                                                                       ^^^

Contrast that with what I see on my own systems:
              ls-13558   [000] ..... 419637.290876: rpc_buf_alloc: task:00000008@...00007 callsize=3932 recvsize=148 status=0
                                                                                                                 ^^^ 
              ls-13558   [000] ..... 419637.290879: rpc_xdr_reply_pages: task:00000008@...00007 head=[0000000050ca7092,144] page=4008(88) tail=[000000007b84934f,4] len=0
                                                                                                                       ^^^
Those values for the receive size and the head iov length are consistent
across all my VMs (not just my Debian ones).

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@...nel.org, trond.myklebust@...merspace.com
>