linux-kernel - Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

Open Source and information security mailing list archives

Message-ID: <a5e8ac37-9769-960b-8134-c90c5c061f12@codeaurora.org>
Date:   Fri, 2 Dec 2016 10:02:39 -0700
From:   "Baicar, Tyler" <tbaicar@...eaurora.org>
To:     "Neftin, Sasha" <sasha.neftin@...el.com>,
        jeffrey.t.kirsher@...el.com, intel-wired-lan@...ts.osuosl.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        okaya@...eaurora.org, timur@...eaurora.org
Subject: Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of
 __E1000_DOWN

Hello Sasha,

Were you able to reproduce this issue?

Do you have a patch fixing the close function inconsistencies that you 
mentioned which I could try out?

Thanks,
Tyler

On 11/21/2016 1:40 PM, Baicar, Tyler wrote:
> On 11/17/2016 6:31 AM, Neftin, Sasha wrote:
>> On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
>>> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>>>> Hello Sasha,
>>>>
>>>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>>>> Move IRQ free code so that it will happen regardless of the
>>>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>>>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>>>>> because the IRQ still has action since it was never freed. A
>>>>>> secondary bus reset can cause this case to happen.
>>>>>>
>>>>>> Signed-off-by: Tyler Baicar <tbaicar@...eaurora.org>
>>>>>> ---
>>>>>>    drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> index 7017281..36cfcb0 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>>>>          if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>>>            e1000e_down(adapter, true);
>>>>>> -        e1000_free_irq(adapter);
>>>>>>              /* Link status message must follow this format */
>>>>>>            pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>>>        }
>>>>>>    +    e1000_free_irq(adapter);
>>>>>> +
>>>>>>        napi_disable(&adapter->napi);
>>>>>>          e1000e_free_tx_resources(adapter->tx_ring);
>>>>>>
>>>>> I would like not recommend insert this change. This change related
>>>>> driver state machine, we afraid from lot of synchronization 
>>>>> problem and
>>>>> issues.
>>>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>>> What do you mean here? There is no loop. If __E1000_DOWN is set 
>>>> then we
>>>> will never free the IRQ.
>>>>
>>>>> Another point, does before execute secondary bus reset your SW 
>>>>> back up
>>>>> pcie configuration space as properly?
>>>> After a secondary bus reset, the link needs to recover and go back 
>>>> to a
>>>> working state after 1 second.
>>>>
>>>>  From the callstack, the issue is happening while removing the 
>>>> endpoint
>>>> from the system, before applying the secondary bus reset.
>>>>
>>>> The order of events is
>>>> 1. remove the drivers
>>>> 2. cause a secondary bus reset
>>>> 3. wait 1 second
>>> Actually, this is too much, usually link up in less than 100ms.You can
>>> check Data Link Layer indication.
>>>> 4. recover the link
>>>>
>>>> callstack:
>>>> free_msi_irqs+0x6c/0x1a8
>>>> pci_disable_msi+0xb0/0x148
>>>> e1000e_reset_interrupt_capability+0x60/0x78
>>>> e1000_remove+0xc8/0x180
>>>> pci_device_remove+0x48/0x118
>>>> __device_release_driver+0x80/0x108
>>>> device_release_driver+0x2c/0x40
>>>> pci_stop_bus_device+0xa0/0xb0
>>>> pci_stop_bus_device+0x3c/0xb0
>>>> pci_stop_root_bus+0x54/0x80
>>>> acpi_pci_root_remove+0x28/0x64
>>>> acpi_bus_trim+0x6c/0xa4
>>>> acpi_device_hotplug+0x19c/0x3f4
>>>> acpi_hotplug_work_fn+0x28/0x3c
>>>> process_one_work+0x150/0x460
>>>> worker_thread+0x50/0x4b8
>>>> kthread+0xd4/0xe8
>>>> ret_from_fork+0x10/0x50
>>>>
>>>> Thanks,
>>>> Tyler
>>>>
>>> Hello Tyler,
>>> Okay, we need consult more about this suggestion.
>>> May I ask what is setup you run? Is there NIC or on board LAN? I would
>>> like try reproduce this issue in our lab's too.
>>> Also, is same issue observed with same scenario and others NIC's too?
>>> Sasha
>>> _______________________________________________
>>> Intel-wired-lan mailing list
>>> Intel-wired-lan@...ts.osuosl.org
>>> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
>>>
>> Hello Tyler,
>> I see some in consistent implementation of __*_close methods in our
>> drivers. Do you have any igb NIC to check if same problem persist there?
>> Thanks,
>> Sasha
> Hello Sasha,
>
> I couldn't find an igb NIC to test with, but I did find another e1000e 
> card that does not cause the same issue. That card is:
>
> 0004:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit 
> Network Connection
>         Subsystem: Intel Corporation Gigabit CT Desktop Adapter
>         Physical Slot: 5-1
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 299
>         Region 0: Memory at c01001c0000 (32-bit, non-prefetchable) 
> [size=128K]
>         Region 1: Memory at c0100100000 (32-bit, non-prefetchable) 
> [size=512K]
>         Region 2: I/O ports at 1000 [size=32]
>         Region 3: Memory at c01001e0000 (32-bit, non-prefetchable) 
> [size=16K]
>         Expansion ROM at c0100180000 [disabled] [size=256K]
>         Capabilities: [c8] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>                 Address: 00000000397f0040  Data: 0000
>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 
> <512ns, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
> Unsupported+
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- 
> AuxPwr+ TransPend-
>                 LnkCap: Port #8, Speed 2.5GT/s, Width x1, ASPM L0s L1, 
> Exit Latency L0s <128ns, L1 <64us
>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>                 LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- 
> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00002000
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- 
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- 
> ChkCap- ChkEn-
>         Capabilities: [140 v1] Device Serial Number 
> 68-05-ca-ff-ff-29-47-34
>         Kernel driver in use: e1000e
>
> Here are the kernel logs from first running the test on this card and 
> then running the test on the Intel 82571EB card that I originally 
> found the issue with (you can see the issue doesn't happen on this 
> card but does on the other):
>
> [   44.146749] ACPI: \_SB_.PCI0: Device has suffered a power fault
> [   44.155238] pcieport 0000:00:00.0: PCIe Bus Error: 
> severity=Uncorrected (Non-Fatal), type=Transaction Layer, 
> id=0000(Requester ID)
> [   44.166111] pcieport 0000:00:00.0:   device [17cb:0400] error 
> status/mask=00004000/00400000
> [   44.174420] pcieport 0000:00:00.0:    [14] Completion Timeout (First)
> [   44.401943] e1000e 0000:01:00.0 eth0: Timesync Tx Control register 
> not set as expected
> [   82.445586] pcieport 0002:00:00.0: PCIe Bus Error: 
> severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
> [   82.454851] pcieport 0002:00:00.0:   device [17cb:0400] error 
> status/mask=00000001/00006000
> [   82.463209] pcieport 0002:00:00.0:    [ 0] Receiver Error
> [   82.469355] pcieport 0002:00:00.0: PCIe Bus Error: 
> severity=Uncorrected (Non-Fatal), type=Transaction Layer, 
> id=0000(Requester ID)
> [   82.481026] pcieport 0002:00:00.0:   device [17cb:0400] error 
> status/mask=00004000/00400000
> [   82.489343] pcieport 0002:00:00.0:    [14] Completion Timeout (First)
> [   82.504573] ACPI: \_SB_.PCI2: Device has suffered a power fault
> [   84.528036] kernel BUG at drivers/pci/msi.c:369!
>
> I'm not sure why it reproduces on the 82571EB card and not the 82574L 
> card. The only obvious difference is there is no Reciever Error on the 
> 82574L card.
>
> If you have a patch fixing the inconsistencies you mentioned with the 
> __*_close methods I would certainly be willing to test it out!
>
> Thanks,
> Tyler
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives