lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd1ca178-1ea3-851f-20a6-10bf00453ce3@nvidia.com>
Date:   Fri, 24 Apr 2020 08:10:26 +0100
From:   Jon Hunter <jonathanh@...dia.com>
To:     Dmitry Osipenko <digetx@...il.com>,
        Thierry Reding <thierry.reding@...il.com>,
        Laxman Dewangan <ldewangan@...dia.com>,
        "Wolfram Sang" <wsa@...-dreams.de>,
        Manikanta Maddireddy <mmaddireddy@...dia.com>,
        Vidya Sagar <vidyas@...dia.com>
CC:     <linux-i2c@...r.kernel.org>, <linux-tegra@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy
 for a long time


On 23/04/2020 17:33, Dmitry Osipenko wrote:
> 23.04.2020 13:56, Jon Hunter пишет:
>>
>> On 22/04/2020 15:07, Dmitry Osipenko wrote:
>>
>> ...
>>
>>>> So I think that part of the problem already existed prior to these
>>>> patches. Without your patches I see ...
>>>>
>>>> [   59.543528] tegra-i2c 7000d000.i2c: i2c transfer timed out
>>>
>>> Does this I2C timeout happen with my patches? Could you please post full
>>> logs of an older and the recent kernel versions?
>>
>> I believe that it does, but I need to check.
>>
>>>> [   59.549036] vdd_sata,avdd_plle: failed to disable
>>>>
>>>> [   59.553778] Failed to disable avdd-plle: -110
>>>>
>>>> [   59.558150] tegra-pcie 3000.pcie: failed to disable regulators: -110
>>>>
>>>>
>>>> However, now with your patches the i2c is failing to resume and this
>>>> breaks system suspend completely for Tegra30. So far this is the only
>>>> board that appears to break.
>>>>
>>>> So the above issue needs to be fixed and I will chat to Thierry about this.
>>>
>>> Okay
>>
>> So I have been looking at this some more and this starting to all look a
>> bit of a mess :-(
>>
>> Firstly, the Tegra PCI driver suspends late (noirq) as we know. The PCI
>> driver will warn if it cannot disable the regulators when suspending but
>> does not actually fail suspend. So this warning is just indicating that
>> we were unable to disable the regulators.
>>
>> Now I don't see that we can ever disable the PCI regulators currently
>> when entering suspend because ...
>>
>> 1. We are in the noirq phase and so we will not get the completion
>>    interrupt for the I2C transfer. I know that you recently added the
>>    atomic I2C transfer support, but we can get the regulator framework
>>    to use this (I have not looked in much detail so far).
> 
> That's not good :) I didn't realize that *all* interrupts of every
> device are disabled before .noirq is invoked. It appeared to me that the
> IRQs disabling and .noirq invocation is performed for the drivers one
> after another, but now I see that it's not true.
> 
> https://elixir.bootlin.com/linux/v5.7-rc2/source/drivers/base/power/main.c#L1446
> 
>> 2. Even if the regulator framework supported atomic I2C transfers, we
>>    also have the problem that the I2C controller could be runtime-
>>    suspended and pm_runtime_get_sync() will not work during during this
>>    phase to resume it correctly. This is a problem that needs to be
>>    fixed indeed!
> 
> Could you please clarify why pm_runtime_get_sync() can't be used by the
> I2C driver's in NOIRQ phase?

Yes take a look at commit 1e2ef05bb8cf ("PM: Limit race conditions
between runtime PM and system sleep (v2)").

>> 3. Then we also have the possible dependency on the DMA driver that is
>>    suspended during the noirq phase.
> 
> Yes, this is correct.
> 
> Again, some regulator drivers may do something on suspend too, although
> looks like the current upstream Tegra devices are not affected by this
> potential problem.
> 
>> It could be argued that if the PCI regulators are never turned off
>> (currently) then we should not even bother and this will likely resolve
>> this for now. However, really we should try to fix that correctly.
> 
> Yes, keeping PCI regulators always-enabled should be a good immediate
> solution.

I was thinking about that, and I am not sure it is. I don't think that
the failure to send the I2C command should break suspend.

> Also, the RPM's system suspend/resume needs to fixed for the pci-tegra
> driver, like I already suggested before.

Yes but I don't think that is the cause of the issue in this particular
case.

>> What I still don't understand is why your patch breaks resume. Even if
>> the I2C transfer fails, and is deemed harmless by the client driver, we
>> should still be able to suspend and resume correctly.
> 
> If DMA is getting synchronized after DMA driver being suspended, then it
> could be a problem.

So I confirmed that DMA is not the issue in this case. I tested this by
ensuring that DMA is never used. However, it is a potential problem
indeed.

> Could you please try to apply this hunk and see if it makes any
> difference (I'll probably make it as proper patch):

Per my tests, I don't believe that it will as disabling DMA does not
resolve the problem.

> It also could be that there is more than the suspend ordering problem,
> but for now it is hard to tell without having a detailed log which
> includes I2C/DMA/RPM traces.

I have taken a look and I don't see any issues with ordering. I2C is
suspended after PCI. This did not change.

Cheers
Jon

-- 
nvpublic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ