lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFJ_xbqhQndqfOAbPizQ3Cuyi1WthbNN3DQuObvWYR3ky4C6DA@mail.gmail.com>
Date:   Fri, 2 Sep 2022 07:49:10 +0200
From:   Lukasz Majczak <lma@...ihalf.com>
To:     Vidya Sagar <vidyas@...dia.com>
Cc:     Kai-Heng Feng <kai.heng.feng@...onical.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Rajat Jain <rajatja@...gle.com>,
        Ben Chuang <benchuanggli@...il.com>, bhelgaas@...gle.com,
        lorenzo.pieralisi@....com, refactormyself@...il.com, kw@...ux.com,
        kenny@...ix.com, treding@...dia.com, jonathanh@...dia.com,
        abhsahu@...dia.com, sagupta@...dia.com, linux-pci@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        kthota@...dia.com, mmaddireddy@...dia.com, sagar.tv@...il.com
Subject: Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume

wt., 30 sie 2022 o 16:02 Vidya Sagar <vidyas@...dia.com> napisał(a):
>
>
>
> On 8/30/2022 4:45 PM, Lukasz Majczak wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > pt., 26 sie 2022 o 15:00 Vidya Sagar <vidyas@...dia.com> napisał(a):
> >>
> >>
> >>
> >> On 8/23/2022 8:25 PM, Kai-Heng Feng wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> Hi Vidya,
> >>>
> >>> On Tue, Aug 9, 2022 at 12:17 AM Vidya Sagar <vidyas@...dia.com> wrote:
> >>>>
> >>>> Thanks Lukasz for the update.
> >>>> I think confirms that there is no issue with the patch as such.
> >>>> Bjorn, could you please define the next step for this patch?
> >>>
> >>> I think the L1SS cap went away _after_ L1SS registers are restored,
> >>> since your patch already check the cap before doing any write:
> >>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>> +       if (!aspm_l1ss)
> >>> +               return;
> >>>
> >>> That means it's more likely to be caused by the following change:
> >>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>
> >>> So is it possible to clear PCI_L1SS_CTL1 before setting PCI_L1SS_CTL2,
> >>> like what aspm_calc_l1ss_info() does?
> >>
> >> I posted a new patch
> >> https://patchwork.kernel.org/project/linux-pci/patch/20220826125526.28859-1-vidyas@nvidia.com/
> >> keeping L1.2 disabled while restoring the rest of the fields in
> >> Control-1 register and restoring the L1.2 enable bits later. Could you
> >> please try this new patch on your setup and update your observations?
> >>
> >> Thanks & Regards,
> >> Vidya Sagar
> >>
> >>>
> >>> Kai-Heng
> >>>
> >>>>
> >>>> Thanks,
> >>>> Vidya Sagar
> >>>>
> >>>> On 8/8/2022 7:37 PM, Lukasz Majczak wrote:
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@...dia.com> napisał(a):
> >>>>>>
> >>>>>> Thanks Lukasz for the logs.
> >>>>>> I still that the L1SS capability in the root port (00:14.0) disappeared
> >>>>>> after resume.
> >>>>>> I still don't understand how this patch can make the capability register
> >>>>>> itself disappear. Honestly, I still see this as a HW issue.
> >>>>>> Bjorn, could you please throw some light on this?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vidya Sagar
> >>>>>>
> >>>>>> On 8/3/2022 5:34 PM, Lukasz Majczak wrote:
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@...dia.com> napisał(a):
> >>>>>>>>
> >>>>>>>> Hi Lukasz,
> >>>>>>>> Thanks for sharing your observations.
> >>>>>>>>
> >>>>>>>> Could you please also share the output of 'sudo lspci -vvvv' before and
> >>>>>>>> after suspend-resume cycle with the latest linux-next?
> >>>>>>>> Do we still see the L1SS capabilities getting disappeared post resume?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Vidya Sagar
> >>>>>>>>
> >>>>>>>> On 7/29/2022 3:09 PM, Lukasz Majczak wrote:
> >>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@...ihalf.com> napisał(a):
> >>>>>>>>>>
> >>>>>>>>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@...gle.com> napisał(a):
> >>>>>>>>>>>
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@...dia.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Agree with Bjorn's observations.
> >>>>>>>>>>>> The fact that the L1SS capability registers themselves disappeared in
> >>>>>>>>>>>> the root port post resume indicates that there seems to be something
> >>>>>>>>>>>> wrong with the BIOS itself.
> >>>>>>>>>>>> Could you please check from that perspective?
> >>>>>>>>>>>
> >>>>>>>>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
> >>>>>>>>>>> is a shallower sleep state that preserves more state than, for e.g. S3
> >>>>>>>>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
> >>>>>>>>>>> at all. i.e. after the kernel runs its suspend routines, it just puts
> >>>>>>>>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to
> >>>>>>>>>>> this.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> >>>>>>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >>>>>>>>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@...onical.com> napisał(a):
> >>>>>>>>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@...il.com> wrote:
> >>>>>>>>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@...dia.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>>>>>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>>>>>>>>>>>>>> configuration being lost post-resume.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Save the L1 Substates control registers so that the configuration is
> >>>>>>>>>>>>>>>>> retained post-resume.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@...dia.com>
> >>>>>>>>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@...dia.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Vidya,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I tested this patch on kernel v5.19-rc6.
> >>>>>>>>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>>>>>>>>>>>>>> This patch can restore L1SS after suspend/resume.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The test results are as follows:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After Boot:
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After suspend/resume without this patch.
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=10us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> After suspend/resume with this patch.
> >>>>>>>>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>>>>>>>>>>>>>>               Capabilities: [110 v1] L1 PM Substates
> >>>>>>>>>>>>>>>>                       L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>>>>>>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>>>>                                 PortCommonModeRestoreTime=255us
> >>>>>>>>>>>>>>>> PortTPowerOnTime=3100us
> >>>>>>>>>>>>>>>>                       L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>>>>                                  T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>>>>>>>>>>>>>>                       L1SubCtl2: T_PwrOn=3100us
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@...il.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Forgot to add mine:
> >>>>>>>>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>> Ben Chuang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>> Kenneth R. Crudup <kenny@...ix.com>, Could you please verify this patch
> >>>>>>>>>>>>>>>>> on your laptop (Dell XPS 13) one last time?
> >>>>>>>>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>>>>>>>>>>>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>> Vidya Sagar
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>        drivers/pci/pci.c       |  7 +++++++
> >>>>>>>>>>>>>>>>>        drivers/pci/pci.h       |  4 ++++
> >>>>>>>>>>>>>>>>>        drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>>>        3 files changed, 55 insertions(+)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.c
> >>>>>>>>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>                       return i;
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>               pci_save_ltr_state(dev);
> >>>>>>>>>>>>>>>>> +       pci_save_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_dpc_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_aer_state(dev);
> >>>>>>>>>>>>>>>>>               pci_save_ptm_state(dev);
> >>>>>>>>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>                * LTR itself (in the PCIe capability).
> >>>>>>>>>>>>>>>>>                */
> >>>>>>>>>>>>>>>>>               pci_restore_ltr_state(dev);
> >>>>>>>>>>>>>>>>> +       pci_restore_aspm_l1ss_state(dev);
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>               pci_restore_pcie_state(dev);
> >>>>>>>>>>>>>>>>>               pci_restore_pasid_state(dev);
> >>>>>>>>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>>               if (error)
> >>>>>>>>>>>>>>>>>                       pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +       error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>>>>>>>>>>>>>> +                                           2 * sizeof(u32));
> >>>>>>>>>>>>>>>>> +       if (error)
> >>>>>>>>>>>>>>>>> +               pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>               pci_allocate_vc_save_buffers(dev);
> >>>>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pci.h
> >>>>>>>>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>>        void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>>>>>>>>>>>>>>        #else
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>>        static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>>>>>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>>>>>>>>>>>>>>        #endif
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>        #ifdef CONFIG_PCIE_ECRC
> >>>>>>>>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>>>>>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>>>>>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>>>>>>>>>>>>>>                                       PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>>>>>>>>>>>>>> +       pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>>>>>>>>>>>>>> +{
> >>>>>>>>>>>>>>>>> +       int aspm_l1ss;
> >>>>>>>>>>>>>>>>> +       struct pci_cap_saved_state *save_state;
> >>>>>>>>>>>>>>>>> +       u32 *cap;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       if (!pci_is_pcie(dev))
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!aspm_l1ss)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>>>>>>>>>>>>>> +       if (!save_state)
> >>>>>>>>>>>>>>>>> +               return;
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>> +       cap = (u32 *)&save_state->cap.data[0];
> >>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>>>>>>>>>>>>>> +       pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>>>>>>>>>>>>>> +}
> >>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>        static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>>>>>>>>>>>>>>        {
> >>>>>>>>>>>>>>>>>               pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> 2.17.1
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With this patch (and also mentioned
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@canonical.com/)
> >>>>>>>>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after
> >>>>>>>>>>>>>> suspend/resume with my WiFi card - it looks like whole communication
> >>>>>>>>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >>>>>>>>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I played a little bit with this code and it looks like the
> >>>>>>>>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >>>>>>>>>>>>>> why, not a PCI expert).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks a lot for testing this!  I'm not quite sure what to make of the
> >>>>>>>>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what
> >>>>>>>>>>>>> other changes are in chromeos-5.10.
> >>>>>>>>>>>
> >>>>>>>>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug
> >>>>>>>>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer
> >>>>>>>>>>> Intel platform that have the latest upstream kernel running already
> >>>>>>>>>>> and see if this can be reproduced there too?
> >>>>>>>>>>> Note that the wifi PCI device is different on newer Intel platforms,
> >>>>>>>>>>> but platform design is similar enough that I suspect we should see
> >>>>>>>>>>> similar bug on those too. The other option is to try the latest
> >>>>>>>>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and
> >>>>>>>>>>> ignore bringing up the graphics stack and GUI), it may come up
> >>>>>>>>>>> sufficiently enough to try this patch?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> Rajat
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Random observations, no analysis below.  This from your dmesg
> >>>>>>>>>>>>> certainly looks like PCI reads failing and returning ~0:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> >>>>>>>>>>>>>         iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> >>>>>>>>>>>>>         iwlwifi 0000:01:00.0: Device gone - attempting removal
> >>>>>>>>>>>>>         Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been
> >>>>>>>>>>>>> reset (BAR is 0):
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> >>>>>>>>>>>>>         pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> lspci diffs from before/after suspend:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>          00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> >>>>>>>>>>>>>            Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> >>>>>>>>>>>>>         -               DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> >>>>>>>>>>>>>         +               DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>>>>         -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         +               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         -               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>>>         +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> >>>>>>>>>>>>>         -       Capabilities: [150 v0] Null
> >>>>>>>>>>>>>         -       Capabilities: [200 v1] L1 PM Substates
> >>>>>>>>>>>>>         -               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>         -                         PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> >>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>         -                          T_CommonMode=40us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>>>         -               L1SubCtl2: T_PwrOn=60us
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The DevSta differences might be BIOS bugs, probably not relevant.
> >>>>>>>>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after
> >>>>>>>>>>>>> re-enumerating 01:00.0?  Strange that the L1 PM Substates capability
> >>>>>>>>>>>>> disappeared.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>          01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> >>>>>>>>>>>>>                         LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> >>>>>>>>>>>>>         -                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>>>         +                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>>>                 Capabilities: [154 v1] L1 PM Substates
> >>>>>>>>>>>>>                         L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> >>>>>>>>>>>>>                                   PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> >>>>>>>>>>>>>         -               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>>>>>>>>>>>         -                          T_CommonMode=0us LTR1.2_Threshold=98304ns
> >>>>>>>>>>>>>         +               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>>>>>>>>>>>         +                          T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Dmesg claimed we reconfigured common clock config.  Maybe ASPM didn't
> >>>>>>>>>>>>> get reinitialized after re-enumeration?  Looks like we didn't restore
> >>>>>>>>>>>>> L1SubCtl1.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bjorn
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Thank you all for the response and input! As Rajat mentioned I'm using
> >>>>>>>>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega
> >>>>>>>>>> (Apollolake)  - I will try to load most recent kernel and give it a
> >>>>>>>>>> try once again.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Lukasz
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>>       I have applied this patch on top of v5.19-rc7 (chromeos) and I'm
> >>>>>>>>> still getting same results:
> >>>>>>>>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>> Lukasz
> >>>>>>>>>
> >>>>>>> Hi Vidya,
> >>>>>>>
> >>>>>>> Sorry for the long delay, I have retested your patch on top of
> >>>>>>> linux-next/master (next-20220802) - the results for my device remain
> >>>>>>> the same.
> >>>>>>> Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg)
> >>>>>>> https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef
> >>>>>>> Let me know if you need any more logs and/or tests.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Lukasz
> >>>>>>>
> >>>>> Hi Vidya,
> >>>>>
> >>>>> After your last email, I've re-tested my setup and (without your
> >>>>> patch)  the capability register also disappears - so it looks there is
> >>>>> - in fact - some problem in my setup and your patch just brings it to
> >>>>> the top as after resume tries to write to a register that is no longer
> >>>>> present. I'm very sorry for the confusion here and I've not notice
> >>>>> that at the very beginning.
> >>>>>
> >>>>> Best regards,
> >>>>> Lukasz
> >>>>>
> >
> > Hi Vidya,
> >
> > For me (on Apollolake devices) the results remain the same, but as
> > I've mentioned earlier - it looks very much related exactly to the
> > Apollolake and is not directly related to your patch (e.g. I'm losing
> > L1SS capabilities even without your patch).
> > As a counter example, I don't  observe any issues with your patach
> > (v3) on Amberlake devices - lspci -vvv before suspend and after resume
> > are exactly the same.
>
> Thanks for the update Lukasz.
> Anyway, i sent V3 fore review. Could you please review it and also test
> it on your platform?
>
> Thanks,
> Vidya Sagar
>
> >
> > Best regards,
> > Lukasz
> >
Hi Vidya,

The results from my previous mail are for V3 of your patch;
Amberlake - works fine
Apollolake - still the same issue, but here it is not related to your
changes (we are still working on this).

Best regards,
Lukasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ