[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-KHFwhVt4xzED7FPLd_xej9nLmoZOgZG0PmsVgbrnN0Ng@mail.gmail.com>
Date: Tue, 22 May 2018 11:41:41 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: "Jon Rosen (jrosen)" <jrosen@...co.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Kees Cook <keescook@...omium.org>,
David Windsor <dwindsor@...il.com>,
"Rosen, Rami" <rami.rosen@...el.com>,
"Reshetova, Elena" <elena.reshetova@...el.com>,
Mike Maloney <maloney@...gle.com>,
Benjamin Poirier <bpoirier@...e.com>,
Thomas Gleixner <tglx@...utronix.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] packet: track ring entry use using a shadow ring to
prevent RX ring overrun
>>> I think the bigger issues as you've pointed out are the cost of
>>> the additional spin lock and should the additional state be
>>> stored in-band (fewer cache lines) or out-of band (less risk of
>>> breaking due to unpredictable application behavior).
>>
>> We don't need the spinlock if clearing the shadow byte after
>> setting the status to user.
>>
>> Worst case, user will set it back to kernel while the shadow
>> byte is not cleared yet and the next producer will drop a packet.
>> But next producers will make progress, so there is no deadlock
>> or corruption.
>
> I thought so too for a while but after spending more time than I
> care to admit I relized the following sequence was occuring:
>
> Core A Core B
> ------ ------
> - Enter spin_lock
> - Get tp_status of head (X)
> tp_status == 0
> - Check inuse
> inuse == 0
> - Allocate entry X
> advance head (X+1)
> set inuse=1
> - Exit spin_lock
>
> <very long delay>
>
> <allocate N-1 entries
> where N = size of ring>
>
> - Enter spin_lock
> - get tp_status of head (X+N)
> tp_status == 0 (but slot
> in use for X on core A)
>
> - write tp_status of <--- trouble!
> X = TP_STATUS_USER <--- trouble!
> - write inuse=0 <--- trouble!
>
> - Check inuse
> inuse == 0
> - Allocate entry X+N
> advance head (X+N+1)
> set inuse=1
> - Exit spin_lock
>
>
> At this point Core A just passed slot X to userspace with a
> packet and Core B has just been assigned slot X+N (same slot as
> X) for it's new packet. Both cores A and B end up filling in that
> slot. Tracking ths donw was one of the reasons it took me a
> while to produce these updated diffs.
Is this not just an ordering issue? Since inuse is set after tp_status,
it has to be tested first (and barriers are needed to avoid reordering).
Powered by blists - more mailing lists