linux-kernel - Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d53692b-6ac8-e1bd-4d0d-7e97aa01b18d@bell.net>
Date:   Tue, 19 Jul 2022 16:59:21 -0400
From:   John David Anglin <dave.anglin@...l.net>
To:     Helge Deller <deller@....de>, Hillf Danton <hdanton@...a.com>
Cc:     linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org
Subject: Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365
 dentry_free+0x100/0x128

Hi Helge,

I hit this warning with the patch below building ghc on mx3210:

mx3210 login: ------------[ cut here ]------------
WARNING: CPU: 2 PID: 32654 at fs/dcache.c:365 dentry_free+0xfc/0x108
Modules linked in: binfmt_misc ext2 ext4 crc16 mbcache jbd2 ipmi_watchdog sg ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd 
ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 uas usb_storage sr_mod 
cdrom ohci_pci sym53c8xx pata_cmd64x ehci_pci ohci_hcd libata scsi_transport_spi ehci_hcd tg3 scsi_mod usbcore scsi_common usb_common
CPU: 2 PID: 32654 Comm: cc1 Not tainted 5.18.12+ #2
Hardware name: 9000/800/rp3440

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000110100000001111 Not tainted
r00-03  000000000804680f 00000040ce7fc880 00000000404f2b74 00000040ce7fc920
r04-07  0000000040be4940 000000410f6cd630 00000001413e4068 000000410f6cd688
r08-11  0000000040fd2e60 0000000040bc5020 0000000040c2c940 00000000000800e0
r12-15  0000000040c2c940 0000000000000001 0000000040c2c940 000000410f6cd688
r16-19  00000001f9fe105d 00000040ce7fc1f8 000000000000002f 000000000a0c1000
r20-23  000000000800000f 000000000800000f 000000410f6cd639 000000000800000f
r24-27  0000000000000000 0000000000000385 000000410f6cd630 0000000040be4940
r28-31  0000000041104530 00000040ce7fc8f0 00000040ce7fc9a0 0000000000000000
sr00-03  0000000000a03800 0000000000000000 0000000000000000 0000000000a03800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404f18bc 00000000404f18c0
  IIR: 03ffe01f    ISR: 0000000010350000  IOR: 00000239ff3fc928
  CPU:        2   CR30: 00000040cadd1380 CR31: ffffffffffffffff
  ORIG_R28: 00000040ce7fcb70
  IAOQ[0]: dentry_free+0xfc/0x108
  IAOQ[1]: dentry_free+0x100/0x108
  RP(r2): __dentry_kill+0x2bc/0x338
Backtrace:
  [<00000000404f2b74>] __dentry_kill+0x2bc/0x338
  [<00000000404f37b8>] dentry_kill+0xb0/0x318
  [<00000000404f3d08>] dput+0x2e8/0x328
  [<00000000404dd7dc>] step_into+0x344/0x390
  [<00000000404dda4c>] walk_component+0xa4/0x310
  [<00000000404df234>] link_path_walk.part.0+0x2ec/0x4b0
  [<00000000404e0000>] path_openat+0xe8/0x348
  [<00000000404e2c58>] do_filp_open+0x98/0x178
  [<00000000404babe8>] do_sys_openat2+0x148/0x288
  [<00000000404bb41c>] compat_sys_openat+0x54/0x98
  [<0000000040203e30>] syscall_exit+0x0/0x10

---[ end trace 0000000000000000 ]---
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:32657]

Regards,
Dave

On 2022-07-19 12:32 p.m., Helge Deller wrote:
> Hello Hillf,
>
> On 7/17/22 13:36, Hillf Danton wrote:
>> On Sun, 17 Jul 2022 11:42:48 +0200
>>> I used WARN_ON() instead of BUG_ON().
>>> With that, both triggered, first the first one, then the second one.
>>> Full log is here:
>>> http://dellerweb.de/testcases/minicom.dcache.crash.6-warn
>> Given the first BUG_ON triggered, and dentry at the moment is supposed to
>> not be alias, see if it is still in lookup with d_lock held. That is the
>> step before de-unioning d_alias with d_in_lookup_hash.
>>
>> On the other hand if only the second one triggered, we should track
>> DCACHE_DENTRY_KILLED instead in assumption that killed dentry was
>> used again after releasing d_lock surrounding the firt one.
> The machine has now been up for 2 days without any issues, while it had pretty
> much the same load as when it was crashing earlier.
> So, in summary I'd assume that your patch below fixes the issue.
>
> I'm now rebooting the machine with a new kernel, where I just changed
> 	if (unlikely(d_in_lookup(dentry)))
> to
> 	if (WARN_ON_ONCE(d_in_lookup(dentry)))
> in order to see if this really triggered.
>
> Anyway, I think your patch is good so far.
> Would that be the final patch, or should I test some others?
>
> Thanks!
> Helge
>
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -605,8 +605,12 @@ static void __dentry_kill(struct dentry
>>   		spin_unlock(&parent->d_lock);
>>   	if (dentry->d_inode)
>>   		dentry_unlink_inode(dentry);
>> -	else
>> +	else {
>> +		if (unlikely(d_in_lookup(dentry))) {
>> +			__d_lookup_done(dentry);
>> +		}
>>   		spin_unlock(&dentry->d_lock);
>> +	}
>>   	this_cpu_dec(nr_dentry);
>>   	if (dentry->d_op && dentry->d_op->d_release)
>>   		dentry->d_op->d_release(dentry);


-- 
John David Anglin  dave.anglin@...l.net