[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240905131831.LI9rTYTd@linutronix.de>
Date: Thu, 5 Sep 2024 15:18:31 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, eric.dumazet@...il.com,
syzbot <syzkaller@...glegroups.com>
Subject: Re: [PATCH net] net: hsr: remove seqnr_lock
On 2024-09-05 14:26:30 [+0200], Eric Dumazet wrote:
> On Thu, Sep 5, 2024 at 2:17 PM Sebastian Andrzej Siewior
> <bigeasy@...utronix.de> wrote:
> >
> > On 2024-09-04 13:37:25 [+0000], Eric Dumazet wrote:
> > > syzbot found a new splat [1].
> > >
> > > Instead of adding yet another spin_lock_bh(&hsr->seqnr_lock) /
> > > spin_unlock_bh(&hsr->seqnr_lock) pair, remove seqnr_lock
> > > and use atomic_t for hsr->sequence_nr and hsr->sup_sequence_nr.
> > >
> > > This also avoid a race in hsr_fill_info().
> >
> > You obtain to sequence nr without locking so two CPUs could submit skbs
> > at the same time. Wouldn't this allow the race I described in commit
> > 06afd2c31d338 ("hsr: Synchronize sending frames to have always incremented outgoing seq nr.")
> >
> > to happen again? Then one skb would be dropped while sending because it
> > has lower sequence nr but in fact it was not yet sent.
> >
>
> A network protocol unable to cope with reorders can not live really.
>
> If this is an issue, this should be fixed at the receiving side.
The standard/ network protocol just says there has to be seq nr to avoid
processing a packet multiple times. The Linux implementation increments
the counter and assumes everything lower than the current counter has
already been seen. Therefore the sequence lock is held during the entire
sending process.
This is not a protocol but implementation issue ;)
I am aware of a FPGA implementation of HSR which tracks the last 20
sequence numbers instead. This would help because it would allow
reorders to happen.
Looking at it, the code chain never held the lock while I was playing
with it and I did not see this. So this might be just a consequence of
using gro here. I don't remember disabling it so it must have been of by
default or syzbot found a way to enable it (or has better hardware).
Would it make sense to disable this for HSR interfaces?
Sebastian
Powered by blists - more mailing lists