[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e3682f8-ec11-40b0-898f-f3729d6f110b@bell.net>
Date: Mon, 2 Dec 2024 14:45:48 -0500
From: John David Anglin <dave.anglin@...l.net>
To: matoro <matoro_mailinglist_kernel@...oro.tk>
Cc: Magnus Lindholm <linmag7@...il.com>,
Linux Parisc <linux-parisc@...r.kernel.org>, deller@...nel.org,
Deller <deller@....de>, Sam James <sam@...too.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] ext4 corruption on parisc since 6.12
On 2024-12-02 10:31 a.m., matoro wrote:
> On 2024-12-02 09:54, John David Anglin wrote:
>> On 2024-12-02 1:30 a.m., Magnus Lindholm wrote:
>>> On Mon, Dec 2, 2024 at 5:55 AM matoro
>>> <matoro_mailinglist_kernel@...oro.tk> wrote:
>>>> Hmm, this is my config, also on an rp3440:
>>>>
>>>> #
>>>> # Timers subsystem
>>>> #
>>>> CONFIG_HZ_PERIODIC=y
>>>> # CONFIG_NO_HZ_IDLE is not set
>>>> # CONFIG_NO_HZ is not set
>>>> # CONFIG_HIGH_RES_TIMERS is not set
>>>> # end of Timers subsystem
>>>>
>>>> lindholm can confirm on their hardware/config. Maybe you can try that and
>>>> see if you can reproduce? I will try your config as well.
>>> Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
>>> that building a kernel CONFIG_SMP=n will mitigate this problem.
>>> I haven't messed around with the config in the Timer subsystem so in my case the
>>> parameters suggested are unset. (my config looks like matoros)
>> The clockevent driver was tested on both rp3440 and c8000, and some other SMP machines.
>> Helge knows details. I have used it on rp3440 and c8000.
>>
>> I would try my settings. The primary reason in switching to the clockevent drivers was to
>> improve clock resolution. The best resolution with the old drivers was 1 ms at 1000 HZ.
>> This caused problems with various package tests. If config is the issue, probably
>> CONFIG_HIGH_RES_TIMERS needs to be forced when clockevent drivers are used.
>>
>> Almost every other system uses the clockevent drivers. So, there was a risk that parisc would
>> become unsupported.
>>
>> I wonder if this could be caused by dead RTC battery. Did you check output of date command?
>> Maybe a dead RTC battery interacts badly with clockevent drivers.
>>
>> I run ntp on all my machines.
>>
>> What files have bad dates (i.e., is this really a ext4 file system issue) or is it just that system has
>> a bad clock?
>>
>> Dave
>
> The files that have bad dates seem to be the ones /init on this system touches at early boot. See the output here:
> https://paste.matoro.tk/8cq8omg
>
> When booted into the bad kernel, date(1) works and displays the correct time. I'm using chrony for time syncing as well.
>
> After switching to the config specified above, boot hangs before even getting to userspace with the following output:
>
> [ 12.473410] 0000:e0:01.1: ttyS2 at MMIO 0xfffffffff4050038 (irq = 73, base_baud = 115200) is a 16550A
> [ 12.757386] sym0: <1010-66> rev 0x1 at pci 0000:20:01.0 irq 70
> [ 12.761419] sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [ 12.885367] sym0: SCSI BUS has been reset.
> [ 12.889389] scsi host0: sym-2.2.3
> [ 13.053380] sym1: <1010-66> rev 0x1 at pci 0000:20:01.1 irq 71
> [ 13.055515] sym1: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [ 13.165367] sym1: SCSI BUS has been reset.
> [ 13.169388] scsi host1: sym-2.2.3
> [ 13.208927] rtc-generic rtc-generic: registered as rtc0
> [ 13.281367] rtc-generic rtc-generic: setting system clock to 2024-12-02T07:17:02 UTC (1733123822)
> [ 13.281367] NET: Registered PF_INET6 protocol family
> [ 13.281367] Segment Routing with IPv6
> [ 13.281367] In-situ OAM (IOAM) with IPv6
> [ 13.281367] registered taskstats version 1
> [ 13.281367] Unstable clock detected, switching default tracing clock to "global"
> [ 13.281367] If you want to keep using the local clock, then add:
> [ 13.281367] "trace_clock=local"
> [ 13.281367] on the kernel command line
>
> At the end there the clock seems to stop progressing forward, as there are several real-time seconds that elapse in between messages with the
> same timestamp. So I'm completely unable to boot with this config at all.
I don't see "Unstable clock detected" message.
I also have in config:
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GENERIC_SCHED_CLOCK=y
Clock seems to get stuck here;
[ 13.281367] rtc-generic rtc-generic: setting system clock to 2024-12-02T07:17:02 UTC (1733123822)
On mx3210, clock continues to increment:
[ 1.995462] rtc-generic rtc-generic: registered as rtc0
[ 2.003158] rtc-generic rtc-generic: setting system clock to 2024-12-01T15:23:25 UTC (1733066605)
[ 2.003719] IR JVC protocol handler initialized
[ 2.004109] IR MCE Keyboard/mouse protocol handler initialized
Dave
--
John David Anglin dave.anglin@...l.net
Powered by blists - more mailing lists