[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9zPPON16NEbzw86@gmail.com>
Date: Fri, 3 Feb 2023 09:09:16 +0000
From: Martin Habets <habetsm.xilinx@...il.com>
To: Íñigo Huguet <ihuguet@...hat.com>
Cc: netdev@...r.kernel.org, richardcochran@...il.com,
yangbo.lu@....com, mlichvar@...hat.com,
gerhard@...leder-embedded.com, ecree.xilinx@...il.com,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, alex.maftei@....com
Subject: Re: PTP vclock: BUG: scheduling while atomic
On Thu, Feb 02, 2023 at 05:02:07PM +0100, Íñigo Huguet wrote:
> Hello,
>
> Our QA team was testing PTP vclocks, and they've found this error with sfc NIC/driver:
> BUG: scheduling while atomic: ptp5/25223/0x00000002
>
> The reason seems to be that vclocks disable interrupts with `spin_lock_irqsave` in
> `ptp_vclock_gettime`, and then read the timecounter, which in turns ends calling to
> the driver's `gettime64` callback.
>
> Vclock framework was added in commit 5d43f951b1ac ("ptp: add ptp virtual clock driver
> framework").
Looking at that commit we'll face the same spinlock issue in
ptp_vclock_adjfine and ptp_vclock_adjtime.
> At first glance, it seems that vclock framework is reusing the already existing callbacks
> of the drivers' ptp clocks, but it's imposing a new limitation that didn't exist before:
> now they can't sleep (due the spin_lock_irqsave). Sfc driver might sleep waiting for the
> fw response.
>
> Sfc driver can be fixed to avoid this issue, but I wonder if something might not be
> correct in the vclock framework. I don't have enough knowledge about how clocks
> synchronization should work regarding this, so I leave it to your consideration.
If the timer hardware is local to the CPU core a spinlock could work.
But if it global across CPUs, or like in our case remote behind a PCI bus,
using a spinlock is too much of a restriction.
I also wonder why the spinlock was used, and if that limitation can be
reduced.
Martin
> These are the logs with stack traces:
> BUG: scheduling while atomic: ptp5/25223/0x00000002
> [...skip...]
> Call Trace:
> dump_stack_lvl+0x34/0x48
> __schedule_bug.cold+0x47/0x53
> __schedule+0x40e/0x580
> schedule+0x43/0xa0
> schedule_timeout+0x88/0x160
> ? __bpf_trace_tick_stop+0x10/0x10
> _efx_mcdi_rpc_finish+0x2a9/0x480 [sfc]
> ? efx_mcdi_send_request+0x1d5/0x260 [sfc]
> ? dequeue_task_stop+0x70/0x70
> _efx_mcdi_rpc.constprop.0+0xcd/0x3d0 [sfc]
> ? update_load_avg+0x7e/0x730
> _efx_mcdi_rpc_evb_retry+0x5d/0x1d0 [sfc]
> efx_mcdi_rpc+0x10/0x20 [sfc]
> efx_phc_gettime+0x5f/0xc0 [sfc]
> ptp_vclock_read+0xa3/0xc0
> timecounter_read+0x11/0x60
> ptp_vclock_refresh+0x31/0x60
> ? ptp_clock_release+0x50/0x50
> ptp_aux_kworker+0x19/0x40
> kthread_worker_fn+0xa9/0x250
> ? kthread_should_park+0x30/0x30
> kthread+0x146/0x170
> ? set_kthread_struct+0x50/0x50
> ret_from_fork+0x1f/0x30
> BUG: scheduling while atomic: ptp5/25223/0x00000000
> [...skip...]
> Call Trace:
> dump_stack_lvl+0x34/0x48
> __schedule_bug.cold+0x47/0x53
> __schedule+0x40e/0x580
> ? ptp_clock_release+0x50/0x50
> schedule+0x43/0xa0
> kthread_worker_fn+0x128/0x250
> ? kthread_should_park+0x30/0x30
> kthread+0x146/0x170
> ? set_kthread_struct+0x50/0x50
> ret_from_fork+0x1f/0x30
Powered by blists - more mailing lists