[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2803ae6c-7ccb-e5c3-43e6-ff4accb9cf24@gmail.com>
Date: Wed, 9 Jan 2019 20:38:10 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Chris Wilson <chris@...is-wilson.co.uk>, netdev@...r.kernel.org
Cc: Florian Fainelli <f.fainelli@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew@...n.ch>, intel-gfx@...ts.freedesktop.org
Subject: Re: tg3 vs commit 2b3e88ea6528 ("net: phy: improve phy state
checking")
On 09.01.2019 14:11, Chris Wilson wrote:
> Hi, we've hit a snag with
>
> commit 2b3e88ea65287ba738a798622405b15344871085
> Author: Heiner Kallweit <hkallweit1@...il.com>
> Date: Sun Dec 16 18:30:14 2018 +0100
>
> net: phy: improve phy state checking
>
> as it is detecting that the call from tg3 during suspend is from the
> HALTED state.
>
> <4> [74.170194] ------------[ cut here ]------------
> <4> [74.170220] called from state HALTED
> <4> [74.170237] WARNING: CPU: 3 PID: 2473 at drivers/net/phy/phy.c:548 phy_start_aneg+0xf1/0x140
> <4> [74.170239] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic asix usbnet mii i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core broadcom bcm_phy_lib snd_pcm tg3 mei_me i2c_i801 mei prime_numbers lpc_ich
> <4> [74.170258] CPU: 3 PID: 2473 Comm: kworker/u16:22 Tainted: G U 5.0.0-rc1-CI-CI_DRM_5366+ #1
> <4> [74.170260] Hardware name: Dell Inc. XPS 8300 /0Y2MRG, BIOS A06 10/17/2011
> <4> [74.170264] Workqueue: events_unbound async_run_entry_fn
> <4> [74.170269] RIP: 0010:phy_start_aneg+0xf1/0x140
> <4> [74.170271] Code: 10 89 93 d0 04 00 00 0f b6 40 04 89 83 d4 04 00 00 e9 74 ff ff ff 48 8b 34 c5 20 a7 ed 81 48 c7 c7 a6 7c 11 82 e8 9f 94 9d ff <0f> 0b 41 bc f0 ff ff ff eb 91 80 a3 c0 04 00 00 7f eb a3 48 b8 ff
> <4> [74.170273] RSP: 0018:ffffc9000044bc80 EFLAGS: 00010282
> <4> [74.170276] RAX: 0000000000000000 RBX: ffff888216e38958 RCX: 0000000000000000
> <4> [74.170278] RDX: 0000000000000000 RSI: ffffffff8213044a RDI: 00000000ffffffff
> <4> [74.170280] RBP: ffff888216e38f00 R08: 00000000c14614ba R09: 0000000000000000
> <4> [74.170282] R10: ffffc9000044bc80 R11: 0000000000000000 R12: ffff888216e38958
> <4> [74.170284] R13: 0000000000000000 R14: ffff888223655f28 R15: ffff88821af6da58
> <4> [74.170287] FS: 0000000000000000(0000) GS:ffff888227ac0000(0000) knlGS:0000000000000000
> <4> [74.170289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [74.170291] CR2: 000055f1dae80d80 CR3: 0000000002214006 CR4: 00000000000606e0
> <4> [74.170293] Call Trace:
> <4> [74.170301] tg3_power_down_prepare+0x754/0xa30 [tg3]
> <4> [74.170308] tg3_suspend+0x1e5/0x350 [tg3]
> <4> [74.170316] pci_pm_suspend+0x6d/0x120
> <4> [74.170319] ? pci_pm_freeze+0xc0/0xc0
> <4> [74.170324] dpm_run_callback+0x64/0x280
> <4> [74.170330] __device_suspend+0x12a/0x5b0
> <4> [74.170335] ? dpm_watchdog_set+0x60/0x60
> <4> [74.170344] async_suspend+0x15/0x90
> <4> [74.170347] async_run_entry_fn+0x34/0x160
> <4> [74.170352] process_one_work+0x245/0x610
> <4> [74.170360] worker_thread+0x37/0x380
> <4> [74.170365] ? process_one_work+0x610/0x610
> <4> [74.170368] kthread+0x119/0x130
> <4> [74.170371] ? kthread_park+0x80/0x80
> <4> [74.170377] ret_from_fork+0x3a/0x50
> <4> [74.170388] irq event stamp: 648
> <4> [74.170392] hardirqs last enabled at (647): [<ffffffff81123be2>] vprintk_emit+0x302/0x320
> <4> [74.170396] hardirqs last disabled at (648): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
> <4> [74.170399] softirqs last enabled at (626): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
> <4> [74.170402] softirqs last disabled at (605): [<ffffffff81a00e1a>] do_softirq_own_stack+0x2a/0x40
> <4> [74.170405] WARNING: CPU: 3 PID: 2473 at drivers/net/phy/phy.c:548 phy_start_aneg+0xf1/0x140
> <4> [74.170407] ---[ end trace e382359f2ec4f600 ]---
>
> Previously phy_start_aneg() would handle the HALTED state but now it is
> a warning. To maintain coverage in our (intel-gfx) CI, we've locally
> reverted 2b3e88ea6528.
> -Chris
>
We added a few warnings to detect wrong usage of the phylib API.
This check seems to be a little bit too strict. I just submitted
a fix.
Still this helped to find an unneeded call to phy_start_aneg.
In tg3_phy_start() there's no need to call phy_start_aneg() after
phy_start(), this is done implicitly by phylib.
Heiner
Powered by blists - more mailing lists