[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ9a7Vgre_3mkXB_xeVx5N9BqPTa2Ai4_8E+daDZ-yAUv44A9g@mail.gmail.com>
Date: Tue, 11 Feb 2025 17:24:28 +0000
From: Mike Leach <mike.leach@...aro.org>
To: Yuanfang Zhang <quic_yuanfang@...cinc.com>
Cc: Leo Yan <leo.yan@....com>, Suzuki K Poulose <suzuki.poulose@....com>,
James Clark <james.clark@...aro.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, kernel@...cinc.com,
coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] coresight-tmc: stop disabling tmc when flush timeout
Hi,
We are not seeing these problems on other systems with multiple ETMs.
How are you using the system?
Perf will strictly disable sources before sinks - which would mean
that the ETMs will flush before the TMC requests a manual flush on
stop.
For sysfs use, we recommend the same, but even halting the TMC first
should result in all sources being requested to flush at the same time
(subject to propagation through connected trace funnels).
On Fri, 7 Feb 2025 at 07:25, Yuanfang Zhang <quic_yuanfang@...cinc.com> wrote:
>
>
>
> On 2/6/2025 6:32 PM, Leo Yan wrote:
> > Hi Yuanfang,
> >
> > On Fri, Jan 03, 2025 at 04:01:54PM +0800, Yuanfang Zhang wrote:
> >>
> >> When multiple ETMs are enabled simultaneously, the time required
> >> to complete a flush in the process of reading the TMC device node
> >> may exceed the default wait time of 100us.
> >
How do you know that it is the ETM flush process that is the issue?
TMC ready can be delayed by issues in the memory system that prevent
write to the memory buffer.
The TMC flush complete will indicate completion of flush for all
connected devices - so if flush complete is indicated by the TMC then
a flush request cannot be the issue.
With multiple ETMs, then the connected trace funnels arbitration will
ensure flush is indicated complete on all connected inputs before
transmitting to the output.
The driver code to stop the TMC will first wait on flush complete
before waiting on TMC ready.
> > Have you tried how long time would be safe to wait for the
> > TMCREADY_BIT bit to be set on your platform?
> yes, 400000us.
This is a huge amount of latency for a coresight system. This
suggests some sort of contention issue.
> >
> >> If the TMC capture is stopped while any ETM has not completed its
> >> flush, it can cause the corresponding CPU to hang.
> >>
There is nothing in the ETMv4 hardware specification /design that
should cause an CPU to hang. There is an implementation dependent
stall mechanism that if implemented can stall a PE when the ETM fifo
approaches overflow, but this is switch off on the drivers, if
implemented on the hardware.
The flush process for the ETM is a purely internal process of clearing
out all trace currently in the ETM, and is not dependent on the PE.
> >> Fix the by checking the TMCReady bit after the flush. If TMCReady
> >> bit is set, TraceCaptEn bit can be clear; otherwise, return directly
> >> and stop the TMC read.
> >
> > When a timeout for TMC flush and stop is detected, if the driver
> > arbitrarily bails out without disabling TMC, this would leave the
> > hardware always on.
> >
I agree that this is not an acceptable solution - we cannot leave the
TMC in an indeterminate state.
Where clients such as perf are using the hardware there is no
mechanism for re-trying if halt fails.
The functions you have changed seem to relate to reading the buffer
via sysfs. Do you see similar problems using perf to collect trace
data?
> > I would like first know if a longer timeout can fix the issue.
> >
> > Thanks,
> > Leo
> >
> yes, longer timeout can fix this issue.
>
I think that a longer timeout will mask some underlying contention
issue, but it should be possible to make the coresight driver timeout
configurable, and set the value appropriately for your system.
There is a new TMC logging patchset here:
https://lists.linaro.org/archives/list/coresight@lists.linaro.org/thread/7Y6NPTNKKIPERUUBHG5UEUNVR7ZMFUGM/
which improves the error reporting around timeouts. This might help
establish if the error is in waiting for flush complete, or TMC ready.
This API could be extended to add a timeout value parameter, that can
be configured from the TMC driver, set via sysfs for the TMC in the
initial case to further debugging of this issue on your system
Regards
Mike
> >> Signed-off-by: Yuanfang Zhang <quic_yuanfang@...cinc.com>
> >> ---
> >> drivers/hwtracing/coresight/coresight-tmc-etf.c | 17 +++++++++++++++--
> >> drivers/hwtracing/coresight/coresight-tmc-etr.c | 22 +++++++++++++++++-----
> >> 2 files changed, 32 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> >> index d4f641cd9de69488fe3d1c1dc9b5a9eafb55ed59..bded290c42891d782344d9a6e63ebdbed6719133 100644
> >> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> >> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> >> @@ -80,11 +80,21 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
> >> return;
> >> }
> >>
> >> -static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
> >> +static int __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
> >> {
> >> + int rc;
> >> +
> >> CS_UNLOCK(drvdata->base);
> >>
> >> tmc_flush_and_stop(drvdata);
> >> +
> >> + rc = tmc_wait_for_tmcready(drvdata);
> >> + if (rc) {
> >> + dev_err(&drvdata->csdev->dev,
> >> + "Failed to disable : TMC is not ready\n");
> >> + CS_LOCK(drvdata->base);
> >> + return rc;
> >> + }
> >> /*
> >> * When operating in sysFS mode the content of the buffer needs to be
> >> * read before the TMC is disabled.
> >> @@ -94,6 +104,7 @@ static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
> >> tmc_disable_hw(drvdata);
> >>
> >> CS_LOCK(drvdata->base);
> >> + return 0;
> >> }
> >>
> >> static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
> >> @@ -650,7 +661,9 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
> >> ret = -EINVAL;
> >> goto out;
> >> }
> >> - __tmc_etb_disable_hw(drvdata);
> >> + ret = __tmc_etb_disable_hw(drvdata);
> >> + if (ret)
> >> + goto out;
> >> }
> >>
> >> drvdata->reading = true;
> >> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >> index a48bb85d0e7f44a25b813f3c828cc3d705d16012..63a1f7501562fa0b5c2fe6ea53dce4d82842bec3 100644
> >> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >> @@ -1135,11 +1135,21 @@ static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
> >> }
> >> }
> >>
> >> -static void __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> >> +static int __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> >> {
> >> + int rc;
> >> +
> >> CS_UNLOCK(drvdata->base);
> >>
> >> tmc_flush_and_stop(drvdata);
> >> +
> >> + rc = tmc_wait_for_tmcready(drvdata);
> >> + if (rc) {
> >> + dev_err(&drvdata->csdev->dev,
> >> + "Failed to disable : TMC is not ready\n");
> >> + CS_LOCK(drvdata->base);
> >> + return rc;
> >> + }
> >> /*
> >> * When operating in sysFS mode the content of the buffer needs to be
> >> * read before the TMC is disabled.
> >> @@ -1150,7 +1160,7 @@ static void __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> >> tmc_disable_hw(drvdata);
> >>
> >> CS_LOCK(drvdata->base);
> >> -
> >> + return 0;
> >> }
> >>
> >> void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> >> @@ -1779,9 +1789,11 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
> >> }
> >>
> >> /* Disable the TMC if we are trying to read from a running session. */
> >> - if (coresight_get_mode(drvdata->csdev) == CS_MODE_SYSFS)
> >> - __tmc_etr_disable_hw(drvdata);
> >> -
> >> + if (coresight_get_mode(drvdata->csdev) == CS_MODE_SYSFS) {
> >> + ret = __tmc_etr_disable_hw(drvdata);
> >> + if (ret)
> >> + goto out;
> >> + }
> >> drvdata->reading = true;
> >> out:
> >> spin_unlock_irqrestore(&drvdata->spinlock, flags);
> >>
> >> ---
> >> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> >> change-id: 20250103-fix_cpu_hung-b5a95179ada4
> >>
> >> Best regards,
> >> --
> >> Yuanfang Zhang <quic_yuanfang@...cinc.com>
> >>
> >>
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Powered by blists - more mailing lists