lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 9 Feb 2022 08:54:09 +0100 From: Oliver Hartkopp <socketcan@...tkopp.net> To: Marc Kleine-Budde <mkl@...gutronix.de> Cc: "Ziyang Xuan (William)" <william.xuanziyang@...wei.com>, davem@...emloft.net, kuba@...nel.org, linux-can@...r.kernel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH net] can: isotp: isotp_rcv_cf(): fix so->rx race problem Hi Marc, On 07.02.22 09:11, Marc Kleine-Budde wrote: > On 28.01.2022 15:48:05, Oliver Hartkopp wrote: >> Hello Marc, hello William, >> >> On 28.01.22 09:46, Marc Kleine-Budde wrote: >>> On 28.01.2022 09:32:40, Oliver Hartkopp wrote: >>>> >>>> >>>> On 28.01.22 09:07, Marc Kleine-Budde wrote: >>>>> On 28.01.2022 08:56:19, Oliver Hartkopp wrote: >>>>>> I've seen the frame processing sometimes freezes for one second when >>>>>> stressing the isotp_rcv() from multiple sources. This finally freezes >>>>>> the entire softirq which is either not good and not needed as we only >>>>>> need to fix this race for stress tests - and not for real world usage >>>>>> that does not create this case. >>>>> >>>>> Hmmm, this doesn't sound good. Can you test with LOCKDEP enabled? >> >> >>>> # >>>> # Lock Debugging (spinlocks, mutexes, etc...) >>>> # >>>> CONFIG_LOCK_DEBUGGING_SUPPORT=y >>>> # CONFIG_PROVE_LOCKING is not set >>> CONFIG_PROVE_LOCKING=y >> >> Now enabled even more locking (seen relevant kernel config at the end). >> >> It turns out that there is no visible difference when using spin_lock() or >> spin_trylock(). >> >> I only got some of these kernel log entries >> >> Jan 28 11:13:14 silver kernel: [ 2396.323211] perf: interrupt took too long >> (2549 > 2500), lowering kernel.perf_event_max_sample_rate to 78250 >> Jan 28 11:25:49 silver kernel: [ 3151.172773] perf: interrupt took too long >> (3188 > 3186), lowering kernel.perf_event_max_sample_rate to 62500 >> Jan 28 11:45:24 silver kernel: [ 4325.583328] perf: interrupt took too long >> (4009 > 3985), lowering kernel.perf_event_max_sample_rate to 49750 >> Jan 28 12:15:46 silver kernel: [ 6148.238246] perf: interrupt took too long >> (5021 > 5011), lowering kernel.perf_event_max_sample_rate to 39750 >> Jan 28 13:01:45 silver kernel: [ 8907.303715] perf: interrupt took too long >> (6285 > 6276), lowering kernel.perf_event_max_sample_rate to 31750 >> >> But I get these sporadically anyway. No other LOCKDEP splat. >> >> At least the issue reported by William should be fixed now - but I'm still >> unclear whether spin_lock() or spin_trylock() is the best approach here in >> the NET_RX softirq?!? > > With the !spin_trylock() -> return you are saying if something > concurrent happens, drop it. This doesn't sound correct. Yes, I had the same feeling and did some extensive load tests using both variants. It turned out the standard spin_lock() works excellent to fix the issue. Thanks for taking it for upstream here: https://lore.kernel.org/linux-can/20220209074818.3ylfz4zmuhit7orc@pengutronix.de/T/#t Best regards, Oliver
Powered by blists - more mailing lists