lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <902ff4995d8e75ad1cd2196bf7d8da42932fba35.camel@kernel.org>
Date: Sun, 16 Nov 2025 13:21:16 -0500
From: Trond Myklebust <trondmy@...nel.org>
To: Chuck Lever <chuck.lever@...cle.com>, Anna Schumaker <anna@...nel.org>
Cc: Salvatore Bonaccorso <carnil@...ian.org>, "1120598@...s.debian.org"	
 <1120598@...s.debian.org>, Jeff Layton <jlayton@...nel.org>, NeilBrown	
 <neil@...wn.name>, Scott Mayhew <smayhew@...hat.com>, Steve Dickson	
 <steved@...hat.com>, Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo	
 <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
 linux-nfs@...r.kernel.org, 	linux-kernel@...r.kernel.org, "Tyler W. Ross"
 <TWR@...erwross.com>
Subject: Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5
 NFSv4 client using SHA2

On Sun, 2025-11-16 at 11:29 -0500, Chuck Lever wrote:
> On 11/15/25 7:38 PM, Tyler W. Ross wrote:
> > On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever
> > <chuck.lever@...cle.com> wrote:
> > > Then I would say further hunting for the broken commit is going
> > > to be
> > > fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next
> > > step so
> > > we see which XDR data item (assuming it's the same one every
> > > time) is
> > > failing to decode.
> > 
> > I added WARNs after each trace_rpc_xdr_overflow() call, and then a
> > couple
> > pr_info() inside xdr_copy_to_scratch() as a follow-up.
> > 
> > If I'm understanding correctly, it's failing in the
> > xdr_copy_to_scratch()
> > call inside xdr_inline_decode(), because the xdr_stream struct has
> > an
> > unset/NULL scratch kvec. I don't understand the context enough to
> > speculate on why, though.
> > 
> > [   26.844102] Entered xdr_copy_to_scratch()
> > [   26.844105] xdr->scratch.iov_base: 0000000000000000
> > [   26.844107] xdr->scratch.iov_len: 0
> > [   26.844127] ------------[ cut here ]------------
> > [   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490
> > xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > [   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4
> > dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr
> > intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid
> > irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp
> > stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss
> > sunrpc configfs efi_pstore nfnetlink vsock_loopback
> > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> > vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache
> > jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib
> > drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper
> > ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy
> > psmouse net_failover failover scsi_mod serio_raw i2c_piix4
> > usb_common scsi_common i2c_smbus
> > [   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted
> > 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
> > [   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > [   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36
> > fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c
> > 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24
> > 08 48 29 d0 48
> > [   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
> > [   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX:
> > ffffd09e82ce3838
> > [   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI:
> > ffff8f1f37c5ce40
> > [   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09:
> > ffffd09e82ce3600
> > [   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12:
> > 000000000000001a
> > [   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15:
> > 0000000000440000
> > [   26.844250] FS:  00007fa5d13db240(0000)
> > GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
> > [   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4:
> > 0000000000750ef0
> > [   26.844255] PKRU: 55555554
> > [   26.844257] Call Trace:
> > [   26.844259]  <TASK>
> > [   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
> > [   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
> > [   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
> > [   26.844311]  call_decode+0x211/0x230 [sunrpc]
> > [   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
> > [   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
> > [   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
> > [   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
> > [   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
> > [   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
> > [   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
> > [   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
> > [   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
> > [   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
> > [   26.844506]  iterate_dir+0xaa/0x270
> 
> Hi Trond, Anna -
> 
> NFSv4 READDIR is hitting an XDR overflow because the XDR stream's
> scratch buffer is missing, and one of the READDIR response's fields
> crosses a page boundary in the receive buffer.
> 
> Shouldn't the client's readdir XDR decoder have a scratch buffer?

No it shouldn't.

The READDIR XDR decoder doesn't interpret the contents of the readdir
buffer. What it is supposed to do is read the op header and the readdir
verifier, and then to align the remaining data into the pages that were
allocated as buffer using a call to xdr_read_page(). Essentially, it's
the exact same procedure as we follow for a READ call.

So if we're crossing into the pages before we hit the call to
xdr_read_pages() then that means we've allocated too small a header
buffer. Since it only appears to happen with RPCSEC_GSS, then my money
would be on AUTH_GSS not padding the reply buffer sufficiently when
setting the value of auth->au_cslack.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@...nel.org, trond.myklebust@...merspace.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ