lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 2 Nov 2021 08:24:08 +0200
From:   Sasha Neftin <sasha.neftin@...el.com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
CC:     Jesse Brandeburg <jesse.brandeburg@...el.com>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
        AceLan Kao <acelan.kao@...onical.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        "Dima Ruinskiy" <dima.ruinskiy@...el.com>,
        "moderated list:INTEL ETHERNET DRIVERS" 
        <intel-wired-lan@...ts.osuosl.org>,
        Linux Netdev List <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Kraus, NechamaX" <nechamax.kraus@...ux.intel.com>,
        "Fuxbrumer, Devora" <devora.fuxbrumer@...el.com>,
        "Avivi, Amir" <amir.avivi@...el.com>,
        "Neftin, Sasha" <sasha.neftin@...el.com>
Subject: Re: [PATCH v2] e1000e: Add a delay to let ME unconfigure s0ix when
 DPG_EXIT_DONE is already flagged

On 11/2/2021 05:27, Kai-Heng Feng wrote:
> On Fri, Oct 29, 2021 at 5:14 PM Sasha Neftin <sasha.neftin@...el.com> wrote:
>>
>> On 10/27/2021 01:50, Kai-Heng Feng wrote:
>>> On Tue, Oct 26, 2021 at 4:48 PM Sasha Neftin <sasha.neftin@...el.com> wrote:
>>>>
>>>> On 10/26/2021 09:51, Kai-Heng Feng wrote:
>>>>> On some ADL platforms, DPG_EXIT_DONE is always flagged so e1000e resume
>>>>> polling logic doesn't wait until ME really unconfigures s0ix.
>>>>>
>>>>> So check DPG_EXIT_DONE before issuing EXIT_DPG, and if it's already
>>>>> flagged, wait for 1 second to let ME unconfigure s0ix.
>>>>>
>>>>> Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
>>>>> ---
>>>>> v2:
>>>>>     Add missing "Fixes:" tag
>>>>>
>>>>>     drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> index 44e2dc8328a22..cd81ba00a6bc9 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> @@ -6493,14 +6493,21 @@ static void e1000e_s0ix_exit_flow(struct e1000_adapter *adapter)
>>>>>         u32 mac_data;
>>>>>         u16 phy_data;
>>>>>         u32 i = 0;
>>>>> +     bool dpg_exit_done;
>>>>>
>>>>>         if (er32(FWSM) & E1000_ICH_FWSM_FW_VALID) {
>>>>> +             dpg_exit_done = er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE;
>>>>>                 /* Request ME unconfigure the device from S0ix */
>>>>>                 mac_data = er32(H2ME);
>>>>>                 mac_data &= ~E1000_H2ME_START_DPG;
>>>>>                 mac_data |= E1000_H2ME_EXIT_DPG;
>>>>>                 ew32(H2ME, mac_data);
>>>>>
>>>>> +             if (dpg_exit_done) {
>>>>> +                     e_warn("DPG_EXIT_DONE is already flagged. This is a firmware bug\n");
>>>>> +                     msleep(1000);
>>>>> +             }
>>>> Thanks for working on the enablement.
>>>> The delay approach is fragile. We need to work with CSME folks to
>>>> understand why _DPG_EXIT_DONE indication is wrong on some ADL platforms.
>>>> Could you provide CSME/BIOS version? dmidecode -t 0 and cat
>>>> /sys/class/mei/mei0/fw_ver
>>>
>>> $ sudo dmidecode -t 0
>>> # dmidecode 3.2
>>> Getting SMBIOS data from sysfs.
>>> SMBIOS 3.4 present.
>>> # SMBIOS implementations newer than version 3.2.0 are not
>>> # fully supported by this version of dmidecode.
>>>
>>> Handle 0x0001, DMI type 0, 26 bytes
>>> BIOS Information
>>>           Vendor: Dell Inc.
>>>           Version: 0.12.68
>>>           Release Date: 10/01/2021
>>>           ROM Size: 48 MB
>>>           Characteristics:
>>>                   PCI is supported
>>>                   PNP is supported
>>>                   BIOS is upgradeable
>>>                   BIOS shadowing is allowed
>>>                   Boot from CD is supported
>>>                   Selectable boot is supported
>>>                   EDD is supported
>>>                   Print screen service is supported (int 5h)
>>>                   8042 keyboard services are supported (int 9h)
>>>                   Serial services are supported (int 14h)
>>>                   Printer services are supported (int 17h)
>>>                   ACPI is supported
>>>                   USB legacy is supported
>>>                   BIOS boot specification is supported
>>>                   Function key-initiated network boot is supported
>>>                   Targeted content distribution is supported
>>>                   UEFI is supported
>>>           BIOS Revision: 0.12
>>>
>>> $ cat /sys/class/mei/mei0/fw_ver
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>> 0:16.0.15.1518
>>>
>> Thank you Kai-Heng. The _DPG_EXIT_DONE bit indication comes from the
>> EXFWSM register controlled by the CSME. We have only read access.  I
>> realized that this indication was set to 1 even before our request to
>> unconfigure the s0ix settings from the CSME. I am wondering. Does after
>> a ~ 1s delay (or less, or more) _DPG_EXIT_DONE indication eventually
>> change by CSME to 0? (is it consistently)
> 
> Never. It's consistently being 1.
no. On my TGL platform is cleared by CSME:
[Sun Oct 31 08:54:40 2021] s0ix exit: EXFWSM register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit (right after sent H2ME): EXFWSM 
register: 0x00000000
[Sun Oct 31 08:54:40 2021] s0ix exit(after polling): EXFWSM register: 
0x00000001
[Sun Oct 31 08:54:40 2021] e1000e 0000:00:1f.6 enp0s31f6: DPG_EXIT_DONE 
cleared after 130 msec
> 
> Right now we are seeing the same issue on TGL, so I wonder if it's
> better to just revert the CSME series?
no. We need to investigate it and understand what is CSME state we hit. 
Meanwhile few options:
1. use privilege flags to disable s0ix flow for problematic system 
(power will increased)
ethtool --set-priv-flags enp0s31f6 s0ix-enabled off
ethtool --show-priv-flags enp0s31f6
Private flags for enp0s31f6:
s0ix-enabled: off
2. delay as you suggested - less preferable I though
3. I would like to suggest (need to check it) in case the DPG_EXIT_DONE 
is 1 (and polling will be exit immediately) - let's perform enforce 
settings to the CSME, before write request to CSME unconfigure the 
device from s0ix :

if (er32(EXFWSM) & E1000_EXFWSM_DPG_EXIT_DONE)
	mac_data |= E1000_H2ME_ENFORCE_SETTINGS;

I will update Bugzilla: 
https://bugzilla.kernel.org/show_bug.cgi?id=214821 with this information.

I also will need some another information regards SMB state in this case.
> 
> Kai-Heng
> 
>>>>>                 /* Poll up to 2.5 seconds for ME to unconfigure DPG.
>>>>>                  * If this takes more than 1 second, show a warning indicating a
>>>>>                  * firmware bug
>>>>>
>>

Powered by blists - more mailing lists