lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Dec 2021 15:01:38 +0100
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     Stefan Dietrich <roots@....de>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>
Cc:     kuba@...nel.org, greg@...ah.com, netdev@...r.kernel.org,
        intel-wired-lan@...ts.osuosl.org, regressions@...ts.linux.dev
Subject: Re: [PATCH] igc: Avoid possible deadlock during suspend/resume

On 10.12.21 14:45, Stefan Dietrich wrote:
> 
> thanks for keeping an eye on the issue. I've sent the files in private
> because I did not want to spam the mailing lists with them. Please let
> me know if this is the correct procedure.

It's likely okay in this case, but FWIW: most of the time it's the wrong
thing to do as outlined here:

https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html#general-advice-for-further-interactions

One reason for this: others that might want to look into the issue now
or a in a year or two might be unable to if crucial data was only sent
in private.

Ciao, Thorsten

> On Fri, 2021-12-10 at 10:40 +0100, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> On 02.12.21 23:34, Vinicius Costa Gomes wrote:
>>> Hi Stefan,
>>>
>>> Stefan Dietrich <roots@....de> writes:
>>>
>>>> Hi Vinicius,
>>>>
>>>> thanks for the patch - unfortunately it did not solve the issue
>>>> and I
>>>> am still getting reboots/lockups.
>>>>
>>>
>>> Thanks for the test. We learned something, not a lot, but
>>> something: the
>>> problem you are facing is PTM related and it's not the same bug as
>>> that
>>> PM deadlock.
>>>
>>> I am still trying to understand what's going on.
>>>
>>> Are you able to send me the 'dmesg' output for the two kernel
>>> configs
>>> (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the
>>> network
>>> interface up or down). Your kernel .config would be useful as well.
>>
>> Stefan, could you provide the data Vinicius asked for? Or did you do
>> that in private already? Or was progress made somewhere else and I
>> simply missed this?
>>
>> Ciao, Thorsten, your Linux kernel regression tracker.
>>
>> P.S.: As a Linux kernel regression tracker I'm getting a lot of
>> reports
>> on my table. I can only look briefly into most of them. Unfortunately
>> therefore I sometimes will get things wrong or miss something
>> important.
>> I hope that's not the case here; if you think it is, don't hesitate
>> to
>> tell me about it in a public reply. That's in everyone's interest, as
>> what I wrote above might be misleading to everyone reading this; any
>> suggestion I gave they thus might sent someone reading this down the
>> wrong rabbit hole, which none of us wants.
>>
>> BTW, I have no personal interest in this issue, which is tracked
>> using
>> regzbot, my Linux kernel regression tracking bot
>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>> this mail to get things rolling again and hence don't need to be CC
>> on
>> all further activities wrt to this regression.
>>
>> #regzbot poke
>>
>>>> On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote:
>>>>> Inspired by:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>>>>
>>>>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@...el.com>
>>>>> ---
>>>>> Just to see if it's indeed the same problem as the bug report
>>>>> above.
>>>>>
>>>>>  drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++
>>>>> ------
>>>>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> b/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> index 0e19b4d02e62..c58bf557a2a1 100644
>>>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> @@ -6619,7 +6619,7 @@ static void
>>>>> igc_deliver_wake_packet(struct
>>>>> net_device *netdev)
>>>>>  	netif_rx(skb);
>>>>>  }
>>>>>
>>>>> -static int __maybe_unused igc_resume(struct device *dev)
>>>>> +static int __maybe_unused __igc_resume(struct device *dev,
>>>>> bool rpm)
>>>>>  {
>>>>>  	struct pci_dev *pdev = to_pci_dev(dev);
>>>>>  	struct net_device *netdev = pci_get_drvdata(pdev);
>>>>> @@ -6661,20 +6661,27 @@ static int __maybe_unused
>>>>> igc_resume(struct
>>>>> device *dev)
>>>>>
>>>>>  	wr32(IGC_WUS, ~0);
>>>>>
>>>>> -	rtnl_lock();
>>>>> +	if (!rpm)
>>>>> +		rtnl_lock();
>>>>>  	if (!err && netif_running(netdev))
>>>>>  		err = __igc_open(netdev, true);
>>>>>
>>>>>  	if (!err)
>>>>>  		netif_device_attach(netdev);
>>>>> -	rtnl_unlock();
>>>>> +	if (!rpm)
>>>>> +		rtnl_unlock();
>>>>>
>>>>>  	return err;
>>>>>  }
>>>>>
>>>>>  static int __maybe_unused igc_runtime_resume(struct device
>>>>> *dev)
>>>>>  {
>>>>> -	return igc_resume(dev);
>>>>> +	return __igc_resume(dev, true);
>>>>> +}
>>>>> +
>>>>> +static int __maybe_unused igc_resume(struct device *dev)
>>>>> +{
>>>>> +	return __igc_resume(dev, false);
>>>>>  }
>>>>>
>>>>>  static int __maybe_unused igc_suspend(struct device *dev)
>>>>> @@ -6738,7 +6745,7 @@ static pci_ers_result_t
>>>>> igc_io_error_detected(struct pci_dev *pdev,
>>>>>   *  @pdev: Pointer to PCI device
>>>>>   *
>>>>>   *  Restart the card from scratch, as if from a cold-boot.
>>>>> Implementation
>>>>> - *  resembles the first-half of the igc_resume routine.
>>>>> + *  resembles the first-half of the __igc_resume routine.
>>>>>   **/
>>>>>  static pci_ers_result_t igc_io_slot_reset(struct pci_dev
>>>>> *pdev)
>>>>>  {
>>>>> @@ -6777,7 +6784,7 @@ static pci_ers_result_t
>>>>> igc_io_slot_reset(struct pci_dev *pdev)
>>>>>   *
>>>>>   *  This callback is called when the error recovery driver
>>>>> tells us
>>>>> that
>>>>>   *  its OK to resume normal operation. Implementation
>>>>> resembles the
>>>>> - *  second-half of the igc_resume routine.
>>>>> + *  second-half of the __igc_resume routine.
>>>>>   */
>>>>>  static void igc_io_resume(struct pci_dev *pdev)
>>>>>  {
>>>
>>> Cheers,
>>>
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ