[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31c884b9-77c8-48dc-b84c-20e52cdc4d44@bell.net>
Date: Sun, 1 Dec 2024 20:47:50 -0500
From: John David Anglin <dave.anglin@...l.net>
To: matoro <matoro_mailinglist_kernel@...oro.tk>,
Linux Parisc <linux-parisc@...r.kernel.org>, deller@...nel.org,
Deller <deller@....de>, linmag7@...il.com, Sam James <sam@...too.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] ext4 corruption on parisc since 6.12
I haven't seen any file system corruption on rp3440 with several weeks of running with clock events. I just
started running 6.12.1 today though.
I have the following timer config:
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem
There was some concern about this change on systems where the CPU timers aren't synchronized. what
systems do you see this on?
Dave
On 2024-12-01 7:26 p.m., matoro wrote:
> Hi Helge, when booting 6.12 here myself and another user (CC'd) both observed our ext4 filesystems to be immediately corrupted in the same
> manner.
>
> Every file that is read or written will have its access/modify times set to 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.
> The 32-bit userspace doesn't seem to be able to handle this at all, as every further stat() call will error with "Value too large for defined
> data type". Unfortunately, simply rolling back to kernel 6.11 is insufficient to recover, as the filesystem corruption is persistent, and the
> errors come from userspace attempting to read the modified files. I was able to recover with a command like: find / -newermt 2446-01-01 -o
> -newerct 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h
>
> Luckily, lindholm was able to bisect and identified as the culprit commit: b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to
> generic clockevents"). Some other comments from the discussion:
>
> 17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n fixes it
> 17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
> 17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically marked the clocksource as unstable on MP machines despite having
> per_cpu before
> 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is enough to work around it
> 17:24:13 <awilfox> maybe it isn't
>
> Thanks!
--
John David Anglin dave.anglin@...l.net
Powered by blists - more mailing lists