[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e30ae181acadd45da8cb91619326f37@wizardsworks.org>
Date: Tue, 24 Jun 2025 16:10:39 -0700
From: Greg Chandler <chandleg@...ardsworks.org>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
Cc: Florian Fainelli <f.fainelli@...il.com>, stable@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: Tulip 21142 panic on physical link disconnect
On 2025/06/19 17:57, Maciej W. Rozycki wrote:
> On Thu, 19 Jun 2025, Greg Chandler wrote:
>
>> > > I am still not sure why I could not see that warning on by Cobalt Qube2
>> > > trying
>> > > to reproduce Greg's original issue, that is with an IP assigned on the
>> > > interface yanking the cable did not trigger a timer warning. It could be
>> > > that
>> > > machine is orders of magnitude slower and has a different CONFIG_HZ value
>> > > that
>> > > just made it less likely to be seen?
>> >
>> > Can it have a different PHY attached? There's this code:
>> >
>> > if (tp->chip_id == PNIC2)
>> > tp->link_change = pnic2_lnk_change;
>> > else if (tp->flags & HAS_NWAY)
>> > tp->link_change = t21142_lnk_change;
>> > else if (tp->flags & HAS_PNICNWAY)
>> > tp->link_change = pnic_lnk_change;
>>
>> I'm not sure which of us that was directed at, but for my onboard
>> tulips:
>
> It was for Florian, as obviously your system does trigger the issue.
>
>> I found a link to the datasheet (If needed), but have had mixed luck
>> with
>> alldatasheets:
>> https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.html
>
> There's no need to chase hw documentation as the issue isn't directly
> related to it.
>
> As I noted in the earlier e-mail it seems a regression in the handling
> of
> `del_timer_sync', perhaps deliberate, introduced sometime between 5.18
> and
> 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2
> actually
> here that worked correctly) and see if it still triggers the problem
> and
> if it does not then bisect it (perhaps limiting the upper bound to 6.4
> if
> it does trigger it for you, to save an iteration or a couple). Once
> you
> know the offender you'll likely know the solution. Or you can come
> back
> with results and ask for one if unsure.
>
> HTH,
>
> Maciej
I haven't had keyboard time in quite a few days, but I've been looking
over the code today.
I removed the HAS_ACPI from the 21142 setup, only to find later it was
only used in a single function to deal with sleep mode stuff.
As I was reading over the driver, I've been taking a look at what could
potentially drop in some of the debgugging statements, and loaded the
module with:
insmod ./tulip.ko tulip_debug=100
[16933.489376] tulip0: EEPROM default media type Autosense
[16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a
21142 Serial PHY (2) block
[16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a
21142 Serial PHY (2) block
[16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a
21143 SYM PHY (4) block
[16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by
a 21143 SYM PHY (4) block
[16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO
0xa120000, 08:00:2b:86:ab:b1, IRQ 29
[16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143
autonegotiation, csr14=0003ffff
[16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify
[16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify
[16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30
[16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized):
tulip_mwi_config()
[16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized):
MWI config cacheline=16, csr0=01a09000
[16933.498165] tulip 0000:00:0b.0: enabling bus mastering
[16933.505001] tulip1: EEPROM default media type Autosense
[16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a
21142 Serial PHY (2) block
[16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a
21142 Serial PHY (2) block
[16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a
21143 SYM PHY (4) block
[16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by
a 21143 SYM PHY (4) block
[16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO
0xa121000, 08:00:2b:86:a8:5b, IRQ 30
[16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143
autonegotiation, csr14=0003ffff
[16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify
[16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0
[16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
This popped up when I bound an IP address to the interface (but not
before)
[17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30
[17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143
autonegotiation, csr14=0003ffff
[17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0660000
[17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 f9a09000,
CSR5 f0760000 CSR6 b2422202
[17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0660000
[17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0660000
[17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0660000
[17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29
[17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143
autonegotiation, csr14=0003ffff
[17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 f9a09000,
CSR5 f0760000 CSR6 b2422202
[17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX/TX
[17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new
csr5=0xf0260000
[17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5
f0268010, fffbffff
[17044.690491] net eth2: Switching to 100baseTx-FDX based on link
negotiation 01e0 & cde1 = 01e0
[17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX
transceiver control 08af/00a0
[17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to
08af0008/00a00008
[17044.690491] tulip 0000:00:09.0 eth2: Using media type 100baseTx-FDX,
CSR12 is ce
[17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 83860200/b3862202
CSR12 cde1d2ce
[17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status
7fffbc85
[17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status
7fffbc84
[17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status
7fffbc84
[17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status
7fffbc84
[17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new
csr5=0xf8668000
[17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5
f8668000, fffbffff
[17044.691468] net eth2: 21143 100baseTx-FDX link beat good
[17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new
csr5=0xf0660000
[17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5
f0668010, fffbff7f
[17044.691468] net eth2: 21143 100baseTx-FDX link beat good
[17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status
7fffb000
[17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt,
csr5=0xf0660000
[17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT
[17045.803772] net eth1: 21143 negotiation failed, status 000021c6
[17045.803772] net eth1: Testing new 21143 media 100baseTx
[17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new
csr5=0xf0200000
[17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0260000
[17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new
csr5=0xf0660000
[17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status
7fffbc85
[17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status
7fffbc84
[17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status
7fffbc84
[17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status
7fffbc84
[17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status
7fffbc84
[17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt,
csr5=0xf0660000
[17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5
0xf0660000 CSR6 0xb3862002)
[17046.053772] net eth2: 21143 negotiation status 000002c8,
100baseTx-FDX
[17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12
000002c8
I'm still working my way through the driver, but I figured I'd post the
additional debug info in case anyone wanted it.
Powered by blists - more mailing lists