[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8e7f845-253b-47b7-9e09-97a580ce0e5c@hartkopp.net>
Date: Mon, 10 Mar 2025 10:45:51 +0100
From: Oliver Hartkopp <socketcan@...tkopp.net>
To: Vincent Mailhol <vincent.mailhol@...il.com>
Cc: mkl@...gutronix.de,
syzbot <syzbot+78ce4489b812515d5e4d@...kaller.appspotmail.com>,
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
linux-can@...r.kernel.org
Subject: Re: [syzbot] [can?] KCSAN: data-race in can_send / can_send (5)
On 10.03.25 10:29, Vincent Mailhol wrote:
> On Mon. 10 Mar 2025 at 03:59, Oliver Hartkopp <socketcan@...tkopp.net> wrote:
>>> value changed: 0x0000000000002b9d -> 0x0000000000002b9e
>>>
>>
>> Increased by '1' ...
>>
>> I assume this problem is caused by increasing the per-netdevice statistic in
>>
>> https://elixir.bootlin.com/linux/v6.13.6/source/net/can/af_can.c#L289
>>
>> pkg_stats->tx_frames++;
>> pkg_stats->tx_frames_delta++;
>>
>> We update the statistics for the device and in this specific case the
>> hrtimer fired on two CPUs resulting in a can_send() to the same netdevice.
>>
>> Do you agree with this quick analysis?
>
> Ack. Same conclusion here.
>
>> Isn't there some lock-less per-cpu safe statistic handling within netdev
>> we might pick for our use-case?
>
> I see two solutions. Either we use lock_sock(skb->sk) and
> release_sock(skb->sk) or we can change the types of
> can_pkg_stats->tx_frames and can_pkg_stats->tx_frames_delta from long
> to atomic_long_t.
>
> The atomic_long_t is the closest solution to a lock-less. But my
> preference goes to the lock_sock() which looks more natural in this
> context. And look_sock() is just a spinlock which under the hood is
> also an atomic, so no big penalty either.
When we get skbs from the netdevice (and not from user space), we do not
have a valid sk value. It is set to zero.
See:
https://elixir.bootlin.com/linux/v6.13.6/source/net/can/raw.c#L203
And those skbs can also be forwarded by can-gw using can_send().
Therefore there is no lock_sock() without a valid sk ;-)
When 'atomic_long_t' would also fix this simple statistics handling, we
should use that.
Best regards,
Oliver
Powered by blists - more mailing lists