netdev - Re: [PATCH] Use a spinlock to guard `fep->ptp_clk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18c0c238-a006-7e52-65c5-1bcec0ee31e5@prolan.hu>
Date:   Thu, 1 Sep 2022 09:51:13 +0200
From:   Csókás Bence <csokas.bence@...lan.hu>
To:     Francesco Dolcini <francesco.dolcini@...adex.com>,
        Andrew Lunn <andrew@...n.ch>
CC:     <netdev@...r.kernel.org>,
        Richard Cochran <richardcochran@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, <kernel@...gutronix.de>,
        Marc Kleine-Budde <mkl@...gutronix.de>
Subject: Re: [PATCH] Use a spinlock to guard `fep->ptp_clk_on`


On 2022. 08. 31. 18:24, Andrew Lunn wrote:
 >>> Please keep to reverse christmas tree
 >>
 >> checkpatch didn't tell me that was a requirement... Easy to fix though.
 >
 > checkpatch does not have the smarts to detect this. And it is a netdev
 > only thing.

I thought checkpatch also has the per-subsystem rules, too.

 > There is also a different between not being able to sleep, and not
 > being able to process an interrupt for some other hardware. You got a
 > report about taking a mutex in atomic context. That just means you
 > cannot sleep, probably because a spinlock is held. That is very
 > different to not being able to handle interrupts. You only need
 > spin_lock_irqsave() if the interrupt handler also needs the same spin
 > lock to protect it from a thread using the spin lock.

Alright, I will switch to plain `spin_lock()` then.

On 2022. 08. 31. 19:12, Francesco Dolcini wrote:
> On Wed, Aug 31, 2022 at 02:56:31PM +0200, Csókás Bence wrote:
>> Mutexes cannot be taken in a non-preemptible context,
>> causing a panic in `fec_ptp_save_state()`. Replacing
>> `ptp_clk_mutex` by `ptp_clk_lock` fixes this.
> 
> I would probably add the kernel BUG trace to the commit message.
> 
>> Fixes: 91c0d987a9788dcc5fe26baafd73bf9242b68900
>> Fixes: 6a4d7234ae9a3bb31181f348ade9bbdb55aeb5c5
>> Fixes: f79959220fa5fbda939592bf91c7a9ea90419040
> 
> Just
> 
> Fixes: f79959220fa5 ("fec: Restart PPS after link state change >
>>
>> Reported-by: Marc Kleine-Budde <mkl@...gutronix.de>
> 
> Is this https://lore.kernel.org/all/20220827160922.642zlcd5foopozru@pengutronix.de/ the report?

Yes, precisely.

> 
> I just stumbled on the same issue on latest torvalds 6.0-rc3.
> 
> [   22.718465] =============================
> [   22.725431] [ BUG: Invalid wait context ]
> [   22.732439] 6.0.0-rc3-00007-gdcf8e5633e2e #1 Tainted: G        W
> [   22.742278] -----------------------------
> [   22.749217] kworker/3:1/45 is trying to lock:
> [   22.757157] c211b71c (&fep->ptp_clk_mutex){+.+.}-{3:3}, at: fec_ptp_gettime+0x30/0xcc
> [   22.770814] other info that might help us debug this:
> [   22.778718] context-{4:4}
> [   22.784065] 4 locks held by kworker/3:1/45:
> [   22.790949]  #0: c20072a8 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x1e4/0x730
> [   22.806494]  #1: e6a15f18 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x730
> [   22.822744]  #2: c287a478 (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x34/0x228
> [   22.835884]  #3: c211b2a4 (&dev->tx_global_lock){+...}-{2:2}, at: netif_tx_lock+0x10/0x1c

Thank you, this lock was the source of the problem!

> 
> Francesco
> 

Bence