linux-kernel - Re: [bisected] ext4 corruption on parisc since 6.12

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <31c884b9-77c8-48dc-b84c-20e52cdc4d44@bell.net>
Date: Sun, 1 Dec 2024 20:47:50 -0500
From: John David Anglin <dave.anglin@...l.net>
To: matoro <matoro_mailinglist_kernel@...oro.tk>,
 Linux Parisc <linux-parisc@...r.kernel.org>, deller@...nel.org,
 Deller <deller@....de>, linmag7@...il.com, Sam James <sam@...too.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] ext4 corruption on parisc since 6.12

I haven't seen any file system corruption on rp3440 with several weeks of running with clock events.  I just
started running 6.12.1 today though.

I have the following timer config:

# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

There was some concern about this change on systems where the CPU timers aren't synchronized.  what
systems do you see this on?

Dave

On 2024-12-01 7:26 p.m., matoro wrote:
> Hi Helge, when booting 6.12 here myself and another user (CC'd) both observed our ext4 filesystems to be immediately corrupted in the same 
> manner.
>
> Every file that is read or written will have its access/modify times set to 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.  
> The 32-bit userspace doesn't seem to be able to handle this at all, as every further stat() call will error with "Value too large for defined 
> data type".  Unfortunately, simply rolling back to kernel 6.11 is insufficient to recover, as the filesystem corruption is persistent, and the 
> errors come from userspace attempting to read the modified files.  I was able to recover with a command like:  find / -newermt 2446-01-01 -o 
> -newerct 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h
>
> Luckily, lindholm was able to bisect and identified as the culprit commit:  b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to 
> generic clockevents").  Some other comments from the discussion:
>
> 17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n fixes it
> 17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
> 17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically marked the clocksource as unstable on MP machines despite having 
> per_cpu before
> 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is enough to work around it
> 17:24:13 <awilfox> maybe it isn't
>
> Thanks!


-- 
John David Anglin  dave.anglin@...l.net