[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f59ba318-8e99-c486-fa4d-1ee28a7b203d@gmail.com>
Date: Tue, 21 Apr 2020 01:11:51 +0300
From: Dmitry Osipenko <digetx@...il.com>
To: Jon Hunter <jonathanh@...dia.com>,
Thierry Reding <thierry.reding@...il.com>,
Laxman Dewangan <ldewangan@...dia.com>,
Wolfram Sang <wsa@...-dreams.de>
Cc: linux-i2c@...r.kernel.org, linux-tegra@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy
for a long time
Hello Jon,
20.04.2020 22:53, Jon Hunter пишет:
> Hi Dmitry,
>
> On 24/03/2020 19:12, Dmitry Osipenko wrote:
>> Boot CPU0 always handle I2C interrupt and under some rare circumstances
>> (like running KASAN + NFS root) it may stuck in uninterruptible state for
>> a significant time. In this case we will get timeout if I2C transfer is
>> running on a sibling CPU, despite of IRQ being raised. In order to handle
>> this rare condition, the IRQ status needs to be checked after completion
>> timeout.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@...il.com>
>> ---
>> drivers/i2c/busses/i2c-tegra.c | 27 +++++++++++++++------------
>> 1 file changed, 15 insertions(+), 12 deletions(-)
>
>
> I have noticed a regression on tegra30-cardhu-a04 when testing system
> suspend. Git bisect is pointing to this commit and reverting it fixes
> the problem. In the below console log I2C fails to resume ...
>
...
> [ 60.690035] PM: Device 3000.pcie failed to resume noirq: error -16
...
> [ 60.696859] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: EMEM address decode error (SMMU translation error [--S])
>
> [ 60.708876] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: Page fault (SMMU translation error [--S])
This looks very wrong, the error tells that 3d hardware is active and
doing something odd. Are you running some 3d tests?
...
> Have you seen this?
No, I haven't seen that. I'm not using PCIE and it looks like it's the
problem.
Looking at the PCIE driver code, seems it's not syncing the RPM state on
suspend/resume.
Please try this change:
--- >8 ---
diff --git a/drivers/pci/controller/pci-tegra.c
b/drivers/pci/controller/pci-tegra.c
index 3e64ba6a36a8..b1fcbae4109c 100644
--- a/drivers/pci/controller/pci-tegra.c
+++ b/drivers/pci/controller/pci-tegra.c
@@ -2870,8 +2870,8 @@ static int __maybe_unused
tegra_pcie_pm_resume(struct device *dev)
static const struct dev_pm_ops tegra_pcie_pm_ops = {
SET_RUNTIME_PM_OPS(tegra_pcie_pm_suspend, tegra_pcie_pm_resume, NULL)
- SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_pcie_pm_suspend,
- tegra_pcie_pm_resume)
+ SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+ pm_runtime_force_resume)
};
static struct platform_driver tegra_pcie_driver = {
--- >8 ---
Secondly, I2C driver suspends on NOIRQ level, while APBDMA driver
suspends on default level. This is also wrong, please try to apply this
hunk as well:
--- >8 ---
diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index f6a2f42ffc51..e682ac86bd27 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -1653,7 +1653,7 @@ static int __maybe_unused
tegra_dma_dev_resume(struct device *dev)
static const struct dev_pm_ops tegra_dma_dev_pm_ops = {
SET_RUNTIME_PM_OPS(tegra_dma_runtime_suspend, tegra_dma_runtime_resume,
NULL)
- SET_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume)
+ SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume)
};
static const struct of_device_id tegra_dma_of_match[] = {
--- >8 ---
Powered by blists - more mailing lists