lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b01cec76-bb39-9fb5-8f6e-4023c075e6b3@gmail.com>
Date:   Tue, 21 Apr 2020 03:32:21 +0300
From:   Dmitry Osipenko <digetx@...il.com>
To:     Jon Hunter <jonathanh@...dia.com>,
        Thierry Reding <thierry.reding@...il.com>,
        Laxman Dewangan <ldewangan@...dia.com>,
        Wolfram Sang <wsa@...-dreams.de>
Cc:     linux-i2c@...r.kernel.org, linux-tegra@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy
 for a long time

21.04.2020 01:11, Dmitry Osipenko пишет:
> Hello Jon,
> 
> 20.04.2020 22:53, Jon Hunter пишет:
>> Hi Dmitry,
>>
>> On 24/03/2020 19:12, Dmitry Osipenko wrote:
>>> Boot CPU0 always handle I2C interrupt and under some rare circumstances
>>> (like running KASAN + NFS root) it may stuck in uninterruptible state for
>>> a significant time. In this case we will get timeout if I2C transfer is
>>> running on a sibling CPU, despite of IRQ being raised. In order to handle
>>> this rare condition, the IRQ status needs to be checked after completion
>>> timeout.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@...il.com>
>>> ---
>>>  drivers/i2c/busses/i2c-tegra.c | 27 +++++++++++++++------------
>>>  1 file changed, 15 insertions(+), 12 deletions(-)
>>
>>
>> I have noticed a regression on tegra30-cardhu-a04 when testing system
>> suspend. Git bisect is pointing to this commit and reverting it fixes
>> the problem. In the below console log I2C fails to resume ...
>>
> ...
>> [   60.690035] PM: Device 3000.pcie failed to resume noirq: error -16
> 
> ...
>> [   60.696859] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: EMEM address decode error (SMMU translation error [--S])
>>
>> [   60.708876] tegra-mc 7000f000.memory-controller: fdcdwr2: write @0x877e8400: Page fault (SMMU translation error [--S])
> 
> This looks very wrong, the error tells that 3d hardware is active and
> doing something odd. Are you running some 3d tests?
> 
> ...
>> Have you seen this?
> 
> No, I haven't seen that. I'm not using PCIE and it looks like it's the
> problem.
> 
> Looking at the PCIE driver code, seems it's not syncing the RPM state on
> suspend/resume.
> 
> Please try this change:
> 
> --- >8 ---
> diff --git a/drivers/pci/controller/pci-tegra.c
> b/drivers/pci/controller/pci-tegra.c
> index 3e64ba6a36a8..b1fcbae4109c 100644
> --- a/drivers/pci/controller/pci-tegra.c
> +++ b/drivers/pci/controller/pci-tegra.c
> @@ -2870,8 +2870,8 @@ static int __maybe_unused
> tegra_pcie_pm_resume(struct device *dev)
> 
>  static const struct dev_pm_ops tegra_pcie_pm_ops = {
>  	SET_RUNTIME_PM_OPS(tegra_pcie_pm_suspend, tegra_pcie_pm_resume, NULL)
> -	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_pcie_pm_suspend,
> -				      tegra_pcie_pm_resume)
> +	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> +				      pm_runtime_force_resume)
>  };
> 
> 
>  static struct platform_driver tegra_pcie_driver = {
> --- >8 ---
> 
> Secondly, I2C driver suspends on NOIRQ level, while APBDMA driver
> suspends on default level. This is also wrong, please try to apply this
> hunk as well:
> 
> --- >8 ---
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index f6a2f42ffc51..e682ac86bd27 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -1653,7 +1653,7 @@ static int __maybe_unused
> tegra_dma_dev_resume(struct device *dev)
>  static const struct dev_pm_ops tegra_dma_dev_pm_ops = {
>  	SET_RUNTIME_PM_OPS(tegra_dma_runtime_suspend, tegra_dma_runtime_resume,
>  			   NULL)
> -	SET_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume)
> +	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_dma_dev_suspend, tegra_dma_dev_resume)
>  };
> 
>  static const struct of_device_id tegra_dma_of_match[] = {
> --- >8 ---
> 

Although, I'm now having a second though about the APBDMA change... I'm
recalling that there are some complications in regards to PCIE driver
suspending, requiring it to be at NOIRQ level, but this should be wrong
because PCIE driver uses voltage regulator driver at NOIRQ level, while
regulator drivers suspend on default level. The current behavior of the
PCIE driver should be wrong, I think it needs to be moved to the default
suspend-resume level somehow.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ