[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-57a8ff91-b9b1-4667-96be-3f8fed25dcc6@palmer-ri-x1c9a>
Date: Thu, 27 Oct 2022 16:07:23 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: Conor Dooley <conor@...nel.org>
CC: daniel.lezcano@...aro.org, tglx@...utronix.de,
Conor Dooley <conor.dooley@...rochip.com>, samuel@...lland.org,
aou@...s.berkeley.edu, atishp@...shpatra.org, dmitriy@...-tech.org,
linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
Paul Walmsley <paul.walmsley@...ive.com>
Subject: Re: [PATCH] Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
On Sun, 23 Oct 2022 11:54:44 PDT (-0700), Conor Dooley wrote:
> From: Conor Dooley <conor.dooley@...rochip.com>
>
> This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.
> If an AXI read to the PCIe controller on PolarFire SoC times out, the
> system will stall, with an expected:
> io scheduler mq-deadline registered
> io scheduler kyber registered
> microchip-pcie 2000000000.pcie: host bridge /soc/pcie@...0000000 ranges:
> microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: axi read request error
> microchip-pcie 2000000000.pcie: axi read timeout
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> Freeing initrd memory: 7336K
> mc_event_handler: 667402 callbacks suppressed
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> mc_event_handler: 666588 callbacks suppressed
> <truncated>
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> mc_event_handler: 666748 callbacks suppressed
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002 softirq=34/36 fqs=2626
> (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
> Task dump for CPU 0:
> task:swapper/0 state:R running task stack: 0 pid: 1 ppid: 0 flags:0x00000008
> Call Trace:
> mc_event_handler: 666648 callbacks suppressed
>
> With this patch applied, the system just locks up without RCU stalling:
> io scheduler mq-deadline registered
> io scheduler kyber registered
> microchip-pcie 2000000000.pcie: host bridge /soc/pcie@...0000000 ranges:
> microchip-pcie 2000000000.pcie: MEM 0x2008000000..0x2087ffffff -> 0x0008000000
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: axi read request error
> microchip-pcie 2000000000.pcie: axi read timeout
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> Freeing initrd memory: 7332K
>
> Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
> Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
> Signed-off-by: Conor Dooley <conor.dooley@...rochip.com>
> ---
> I don't really want to post a revert, but it's been nearly a month since
> I posted about my issue initially & 2 weeks without a reply to Palmer's
> comments.
> CC: samuel@...lland.org
> CC: aou@...s.berkeley.edu
> CC: atishp@...shpatra.org
> CC: daniel.lezcano@...aro.org
> CC: dmitriy@...-tech.org
> CC: linux-kernel@...r.kernel.org
> CC: linux-riscv@...ts.infradead.org
> CC: palmer@...belt.com
> CC: paul.walmsley@...ive.com
> CC: tglx@...utronix.de
> ---
> drivers/clocksource/timer-riscv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
> index 969a552da8d2..a0d66fabf073 100644
> --- a/drivers/clocksource/timer-riscv.c
> +++ b/drivers/clocksource/timer-riscv.c
> @@ -51,7 +51,7 @@ static int riscv_clock_next_event(unsigned long delta,
> static unsigned int riscv_clock_event_irq;
> static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
> .name = "riscv_timer_clockevent",
> - .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
> + .features = CLOCK_EVT_FEAT_ONESHOT,
> .rating = 100,
> .set_next_event = riscv_clock_next_event,
> };
There's some discussion on that linked patch and we don't really have a
fix yet, but IMO we're better off reverting this as it breaks the common
case and it's not clear this is even a sane way to fix the bug.
Reviewed-by: Palmer Dabbelt <palmer@...osinc.com>
Acked-by: Palmer Dabbelt <palmer@...osinc.com>
Powered by blists - more mailing lists