lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Feb 2023 12:52:34 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Miroslav Lichvar <mlichvar@...hat.com>,
        Íñigo Huguet <ihuguet@...hat.com>
CC:     <netdev@...r.kernel.org>, <richardcochran@...il.com>,
        <yangbo.lu@....com>, <gerhard@...leder-embedded.com>,
        <habetsm.xilinx@...il.com>, <ecree.xilinx@...il.com>,
        <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
        <pabeni@...hat.com>, <alex.maftei@....com>
Subject: Re: PTP vclock: BUG: scheduling while atomic



On 2/2/2023 8:33 AM, Miroslav Lichvar wrote:
> On Thu, Feb 02, 2023 at 05:02:07PM +0100, Íñigo Huguet wrote:
>> Our QA team was testing PTP vclocks, and they've found this error with sfc NIC/driver:
>>   BUG: scheduling while atomic: ptp5/25223/0x00000002
>>
>> The reason seems to be that vclocks disable interrupts with `spin_lock_irqsave` in
>> `ptp_vclock_gettime`, and then read the timecounter, which in turns ends calling to
>> the driver's `gettime64` callback.
> 
> The same issue was observed with the ice driver:
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20221107/030633.html
> 
> I tried to fix it generally in the vclock support, but was not
> successful. There was a hint it would be fixed in the driver. I'm not
> sure what is the best approach here.
> 

This slipped through the cracks. The root cause (for ice) is that the
.gettime callback might sleep while waiting for the HW semaphore registers.

We had a change that fixed this (though we had done it for other
reasons) by simply not blocking gettime access with the semaphore, but
Richard didn't like this approach and NAK'd the patch on netdev:

https://lore.kernel.org/intel-wired-lan/877d0yt0ns.fsf@intel.com/

Alternatives are challenging here as the semaphore is used across
multiple PFs which makes using a spinlock difficult, as the PFs don't
share any references.

We could switch the part that does usleep to udelay instead, so we
wouldn't cause this scheduling bug... not sure if that has any other
side effects.

I'm not sure if there's another way to drop the semaphore and assuage
concerns over correctness. We need to read the time registers and its
possible another thread (or in principle another PF) is modifying the
time. We only expose a single PTP clock device, but all PFs can access
the time for the purpose of caching value used for extending 40bit
timestamps.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ