[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71fae3d3a9bd816ea268eb73c152b564@matoro.tk>
Date: Sun, 01 Dec 2024 23:55:29 -0500
From: matoro <matoro_mailinglist_kernel@...oro.tk>
To: John David Anglin <dave.anglin@...l.net>
Cc: Linux Parisc <linux-parisc@...r.kernel.org>, deller@...nel.org, Deller
<deller@....de>, linmag7@...il.com, Sam James <sam@...too.org>, Linux Kernel
Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] ext4 corruption on parisc since 6.12
Hmm, this is my config, also on an rp3440:
#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem
lindholm can confirm on their hardware/config. Maybe you can try that and
see if you can reproduce? I will try your config as well.
On 2024-12-01 20:47, John David Anglin wrote:
> I haven't seen any file system corruption on rp3440 with several weeks of
> running with clock events. I just
> started running 6.12.1 today though.
>
> I have the following timer config:
>
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ is not set
> CONFIG_HIGH_RES_TIMERS=y
> # end of Timers subsystem
>
> There was some concern about this change on systems where the CPU timers
> aren't synchronized. what
> systems do you see this on?
>
> Dave
>
> On 2024-12-01 7:26 p.m., matoro wrote:
>> Hi Helge, when booting 6.12 here myself and another user (CC'd) both
>> observed our ext4 filesystems to be immediately corrupted in the same
>> manner.
>>
>> Every file that is read or written will have its access/modify times set to
>> 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp. The 32-bit
>> userspace doesn't seem to be able to handle this at all, as every further
>> stat() call will error with "Value too large for defined data type".
>> Unfortunately, simply rolling back to kernel 6.11 is insufficient to
>> recover, as the filesystem corruption is persistent, and the errors come
>> from userspace attempting to read the modified files. I was able to
>> recover with a command like: find / -newermt 2446-01-01 -o -newerct
>> 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h
>>
>> Luckily, lindholm was able to bisect and identified as the culprit commit:
>> b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to generic
>> clockevents"). Some other comments from the discussion:
>>
>> 17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n
>> fixes it
>> 17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
>> 17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically
>> marked the clocksource as unstable on MP machines despite having per_cpu
>> before
>> 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is
>> enough to work around it
>> 17:24:13 <awilfox> maybe it isn't
>>
>> Thanks!
Powered by blists - more mailing lists