[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e942c1bbe48b451b92620396746986ab@XCH-RTP-016.cisco.com>
Date: Wed, 23 May 2018 11:08:33 +0000
From: "Jon Rosen (jrosen)" <jrosen@...co.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
CC: "David S. Miller" <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Kees Cook <keescook@...omium.org>,
David Windsor <dwindsor@...il.com>,
"Rosen, Rami" <rami.rosen@...el.com>,
"Reshetova, Elena" <elena.reshetova@...el.com>,
"Mike Maloney" <maloney@...gle.com>,
Benjamin Poirier <bpoirier@...e.com>,
"Thomas Gleixner" <tglx@...utronix.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2] packet: track ring entry use using a shadow ring to
prevent RX ring overrun
> >>> I think the bigger issues as you've pointed out are the cost of
> >>> the additional spin lock and should the additional state be
> >>> stored in-band (fewer cache lines) or out-of band (less risk of
> >>> breaking due to unpredictable application behavior).
> >>
> >> We don't need the spinlock if clearing the shadow byte after
> >> setting the status to user.
> >>
> >> Worst case, user will set it back to kernel while the shadow
> >> byte is not cleared yet and the next producer will drop a packet.
> >> But next producers will make progress, so there is no deadlock
> >> or corruption.
> >
> > I thought so too for a while but after spending more time than I
> > care to admit I relized the following sequence was occuring:
> >
> > Core A Core B
> > ------ ------
> > - Enter spin_lock
> > - Get tp_status of head (X)
> > tp_status == 0
> > - Check inuse
> > inuse == 0
> > - Allocate entry X
> > advance head (X+1)
> > set inuse=1
> > - Exit spin_lock
> >
> > <very long delay>
> >
> > <allocate N-1 entries
> > where N = size of ring>
> >
> > - Enter spin_lock
> > - get tp_status of head (X+N)
> > tp_status == 0 (but slot
> > in use for X on core A)
> >
> > - write tp_status of <--- trouble!
> > X = TP_STATUS_USER <--- trouble!
> > - write inuse=0 <--- trouble!
> >
> > - Check inuse
> > inuse == 0
> > - Allocate entry X+N
> > advance head (X+N+1)
> > set inuse=1
> > - Exit spin_lock
> >
> >
> > At this point Core A just passed slot X to userspace with a
> > packet and Core B has just been assigned slot X+N (same slot as
> > X) for it's new packet. Both cores A and B end up filling in that
> > slot. Tracking ths donw was one of the reasons it took me a
> > while to produce these updated diffs.
>
> Is this not just an ordering issue? Since inuse is set after tp_status,
> it has to be tested first (and barriers are needed to avoid reordering).
I changed the code as you suggest to do the inuse check first and
removed the extra added spin_lock/unlock and it seems to be working.
I was able to run through the night without an issue (normally I would
hit the ring corruption in 1 to 2 hours).
Thanks for pointing that out, I should have caught that myself. Next
I'll look at your suggestion for where to put the shadow ring.
Powered by blists - more mailing lists