lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87o7di8o50.fsf@mailhost.krisman.be>
Date: Thu, 18 Jan 2024 15:58:19 -0300
From: Gabriel Krisman Bertazi <krisman@...e.de>
To: kernel test robot <oliver.sang@...el.com>
Cc: <oe-lkp@...ts.linux.dev>,  <lkp@...el.com>,
  <linux-fscrypt@...r.kernel.org>,  <viro@...iv.linux.org.uk>,
  <ebiggers@...nel.org>,  <jaegeuk@...nel.org>,  <tytso@....edu>,
  <linux-f2fs-devel@...ts.sourceforge.net>,  <linux-ext4@...r.kernel.org>,
  <linux-fsdevel@...r.kernel.org>,  <amir73il@...il.com>
Subject: Re: [PATCH v3 04/10] fscrypt: Drop d_revalidate once the key is added

kernel test robot <oliver.sang@...el.com> writes:

> Hello,
>
> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>
> commit: 1cfe4ba685d9eb6123648a0d9bef2d3d57b078ef ("[PATCH v3 04/10] fscrypt: Drop d_revalidate once the key is added")
> url: https://github.com/intel-lab-lkp/linux/commits/Gabriel-Krisman-Bertazi/ovl-Reject-mounting-case-insensitive-filesystems/20240112-070113
> base: https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test
> patch link: https://lore.kernel.org/all/20240111225816.18117-5-krisman@suse.de/
> patch subject: [PATCH v3 04/10] fscrypt: Drop d_revalidate once the key is added
>
> in testcase: fxmark
> version: fxmark-x86_64-0ce9491-1_20220601
> with following parameters:
>
> 	disk: 1SSD
> 	media: ssd
> 	test: MWRL
> 	fstype: xfs
> 	directio: bufferedio
> 	cpufreq_governor: performance
>
>
>
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202401181648.4192e541-oliver.sang@intel.com
>
>
> [   73.173380][ T6828] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [   73.181338][ T6828] #PF: supervisor read access in kernel mode
> [   73.187453][ T6828] #PF: error_code(0x0000) - not-present page
> [   73.193566][ T6828] PGD 11cc47067 P4D 0
> [   73.197762][ T6828] Oops: 0000 [#1] SMP NOPTI
> [   73.202383][ T6828] CPU: 16 PID: 6828 Comm: fxmark Tainted: G S                 6.7.0-rc1-00176-g1cfe4ba685d9 #1
> [   73.212818][ T6828] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
> [ 73.224837][ T6828] RIP: 0010:__d_move (include/linux/fscrypt.h:241 fs/dcache.c:3003) 
> [ 73.229912][ T6828] Code: c1 00 00 00 08 0f 84 ed 00 00 00 81 e1 3f 10 07 00 0f 84 e1 00 00 00 80 cc 40 89 c1 81 e1 ff ff ff fd 41 89 4d 00 49 8b 4d 60 <48> 81 39 10 21 4e 81 0f 84 66 01 00 00 83 43 04 01 41 83 45 04 01
> All code
> ========
>    0:	c1 00 00             	roll   $0x0,(%rax)
>    3:	00 08                	add    %cl,(%rax)
>    5:	0f 84 ed 00 00 00    	je     0xf8
>    b:	81 e1 3f 10 07 00    	and    $0x7103f,%ecx
>   11:	0f 84 e1 00 00 00    	je     0xf8
>   17:	80 cc 40             	or     $0x40,%ah
>   1a:	89 c1                	mov    %eax,%ecx
>   1c:	81 e1 ff ff ff fd    	and    $0xfdffffff,%ecx
>   22:	41 89 4d 00          	mov    %ecx,0x0(%r13)
>   26:	49 8b 4d 60          	mov    0x60(%r13),%rcx
>   2a:*	48 81 39 10 21 4e 81 	cmpq   $0xffffffff814e2110,(%rcx)		<-- trapping instruction
>   31:	0f 84 66 01 00 00    	je     0x19d
>   37:	83 43 04 01          	addl   $0x1,0x4(%rbx)
>   3b:	41 83 45 04 01       	addl   $0x1,0x4(%r13)
>
> Code starting with the faulting instruction
> ===========================================
>    0:	48 81 39 10 21 4e 81 	cmpq   $0xffffffff814e2110,(%rcx)
>    7:	0f 84 66 01 00 00    	je     0x173
>    d:	83 43 04 01          	addl   $0x1,0x4(%rbx)
>   11:	41 83 45 04 01       	addl   $0x1,0x4(%r13)
> [   73.249920][ T6828] RSP: 0018:ffa000000a99bce8 EFLAGS: 00010206
> [   73.256134][ T6828] RAX: 0000000000480000 RBX: ff1100012cab5380 RCX: 0000000000000000
> [   73.264248][ T6828] RDX: ff1100012cab4609 RSI: 0000000000000000 RDI: ff1100012cab4600
> [   73.272366][ T6828] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000020c
> [   73.280473][ T6828] R10: ff110001622ddde0 R11: 0000000000010000 R12: 0000000000000000
> [   73.288584][ T6828] R13: ff1100012cab4600 R14: 0000000000000000 R15: ff1100012cab5200
> [   73.296699][ T6828] FS:  00007f1073011600(0000) GS:ff1100103f600000(0000) knlGS:0000000000000000
> [   73.305766][ T6828] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   73.312488][ T6828] CR2: 0000000000000000 CR3: 000000012af2a006 CR4: 0000000000771ef0
> [   73.320596][ T6828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   73.328699][ T6828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   73.336803][ T6828] PKRU: 55555554
> [   73.340485][ T6828] Call Trace:
> [   73.343900][ T6828]  <TASK>
> [ 73.346960][ T6828] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) 
> [ 73.350983][ T6828] ? page_fault_oops (arch/x86/mm/fault.c:707) 
> [ 73.355957][ T6828] ? exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1513 arch/x86/mm/fault.c:1561) 
> [ 73.360837][ T6828] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570) 
> [ 73.365974][ T6828] ? __d_move (include/linux/fscrypt.h:241 fs/dcache.c:3003) 
> [ 73.370410][ T6828] ? __d_move (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/asm-generic/qspinlock.h:57 include/linux/fsnotify_backend.h:580 fs/dcache.c:3002) 
> [ 73.374846][ T6828] d_move (include/linux/seqlock.h:500 include/linux/seqlock.h:572 include/linux/seqlock.h:910 fs/dcache.c:3032) 
> [ 73.378757][ T6828] vfs_rename (include/linux/fs.h:807 fs/namei.c:4864) 
> [ 73.383189][ T6828] ? do_renameat2 (fs/namei.c:4996) 
> [ 73.387963][ T6828] do_renameat2 (fs/namei.c:4996) 
> [ 73.392568][ T6828] __x64_sys_rename (fs/namei.c:5040) 
> [ 73.397336][ T6828] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [ 73.401835][ T6828] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 
> [   73.407817][ T6828] RIP: 0033:0x7f1072e83ed7
> [ 73.412325][ T6828] Code: e8 6e 82 09 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5d c3 66 90 b8 ff ff ff ff 5d c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 89 8f 17 00 f7 d8 64 89 02 b8
> All code
> ========
>    0:	e8 6e 82 09 00       	callq  0x98273
>    5:	85 c0                	test   %eax,%eax
>    7:	0f 95 c0             	setne  %al
>    a:	0f b6 c0             	movzbl %al,%eax
>    d:	f7 d8                	neg    %eax
>    f:	5d                   	pop    %rbp
>   10:	c3                   	retq   
>   11:	66 90                	xchg   %ax,%ax
>   13:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
>   18:	5d                   	pop    %rbp
>   19:	c3                   	retq   
>   1a:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
>   21:	00 00 
>   23:	b8 52 00 00 00       	mov    $0x52,%eax
>   28:	0f 05                	syscall 
>   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
>   30:	77 01                	ja     0x33
>   32:	c3                   	retq
>   33:	48 8b 15 89 8f 17 00 	mov    0x178f89(%rip),%rdx        # 0x178fc3
>   3a:	f7 d8                	neg    %eax
>   3c:	64 89 02             	mov    %eax,%fs:(%rdx)
>   3f:	b8                   	.byte 0xb8
>

Hm. So, the thing I missed here is that fscrypt_handle_d_move will be
called even by filesystems that don't support fscrypt.  While we know
fscrypt filesystem dentries must have ->d_op, others might not have
it, and the dereferencing of dentry->d_op causes the oops at:

  if (dentry->d_op->d_revalidate == fscrypt_d_revalidate)

causes the Oops.

One fix would be to prevent non-fscrypt filesystems from calling this
function. But since __d_move only touches the dentries, I think I'll
leave it as-is and just do:

  if (dentry->d_op && dentry->d_op->d_revalidate)

sorry for the noise.

-- 
Gabriel Krisman Bertazi

Powered by blists - more mailing lists