lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ufv5Te668Y_tszQfuH0g_Zsn=oErQ8gAfX6FwHRUm+H3A@mail.gmail.com>
Date:   Fri, 24 Mar 2023 08:45:23 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
        pabeni@...hat.com, willemb@...gle.com
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Thu, Mar 23, 2023 at 8:09 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 23 Mar 2023 09:05:35 -0700 Alexander H Duyck wrote:
> > > +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs)         \
> > > +   ({                                                              \
> > > +           int _res;                                               \
> > > +                                                                   \
> > > +           netif_tx_stop_queue(txq);                               \
> > > +                                                                   \
> > > +           smp_mb();                                               \
> > > +                                                                   \
> > > +           /* We need to check again in a case another             \
> > > +            * CPU has just made room available.                    \
> > > +            */                                                     \
> > > +           if (likely(get_desc < start_thrs)) {                    \
> > > +                   _res = 0;                                       \
> > > +           } else {                                                \
> > > +                   netif_tx_start_queue(txq);                      \
> > > +                   _res = -1;                                      \
> > > +           }                                                       \
> > > +           _res;                                                   \
> > > +   })                                                              \
> > > +
> >
> > The issue I see here is that with this being a macro it abstracts away
> > the relationship between get_desc and the memory barrier. At a minimum
> > I think we should be treating get_desc as a function instead of just
> > passing it and its arguments as a single value. Maybe something more
> > like how read_poll_timeout passes the "op" and then uses it as a
> > function with args passed seperately. What we want to avoid is passing
> > a precomuted value to this function as get_desc.
>
> The kdocs hopefully have enough warnings. The issue I see with
> read_poll_timeout() is that I always have to have the definition
> open side by side to match up the arguments. I wish there was
> a way the test that something is not an lval, but I couldn't
> find it :(
>
> Let's see if anyone gets this wrong, you can tell me "I told you so"?

The setup for it makes me really uncomfortable. Passing a function w/
arguments as an argument itself usually implies that it is called
first, not during.

> > In addition I think I would prefer to see _res initialized to the
> > likely value so that we can drop this to one case instead of having to
> > have two. Same thing for the macros below.
>
> Alright.
>
> > > +/**
> > > + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> > > + * @txq:   struct netdev_queue to stop/start
> > > + * @get_desc:      get current number of free descriptors (see requirements below!)
> > > + * @stop_thrs:     minimal number of available descriptors for queue to be left
> > > + *         enabled
> > > + * @start_thrs:    minimal number of descriptors to re-enable the queue, can be
> > > + *         equal to @stop_thrs or higher to avoid frequent waking
> > > + *
> > > + * All arguments may be evaluated multiple times, beware of side effects.
> > > + * @get_desc must be a formula or a function call, it must always
> > > + * return up-to-date information when evaluated!
> > > + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
> > > + *
> > > + * Returns:
> > > + *  0 if the queue was stopped
> > > + *  1 if the queue was left enabled
> > > + * -1 if the queue was re-enabled (raced with waking)
> > > + */
> >
> > We may want to change the values here. The most likely case is "left
> > enabled" with that being the case we probably want to make that the 0
> > case. I would then probably make 1 the re-enabled case and -1 the
> > stopped case.
>
> I chose the return values this way because the important information is
> whether the queue was in fact stopped (in case the macro is used at the
> start of .xmit as a safety check). If stopped is zero caller can check
> !ret vs !!ret.
>
> Seems pretty normal for the kernel function called stop() to return 0
> if it did stop.

Except this isn't "stop", this is "maybe stop". Maybe we should just
do away with the stop/wake messaging and go with something such as a
RTS/CTS type setup. Basically this function is acting as a RTS to
verify that we have room on the ring to place the frame. If we don't
we are stopped. The "wake" function is on what is essentially the
receiving end on the other side of the hardware after it has DMAed the
frames and is providing the CTS signal back.

> > With that the decision tree becomes more straightforward as we would do
> > something like:
> >       if (result) {
> >               if (result < 0)
> >                       Increment stopped stat
> >                       return
> >               else
> >                       Increment restarted stat
> >       }
>
> Do you see a driver where it matters? ixgbe and co. use
> netif_tx_queue_try_stop() and again they just test stopped vs not stopped.

The thing is in order to make this work for the ixgbe patch you didn't
use the maybe_stop instead you went with the try_stop. If you replaced
the ixgbe_maybe_stop_tx with your maybe stop would have to do
something such as the code above to make it work. That is what I am
getting at. From what I can tell the only real difference between
ixgbe_maybe_stop_tx and your maybe_stop is that you avoided having to
move the restart_queue stat increment out.

The general thought is I would prefer to keep it so that 0 is the
default most likely case in both where the queue is enabled and is
still enabled. By moving the "take action" items into the 1/-1 values
then it becomes much easier to sort them out with 1 being a stat
increment and -1 being an indication to stop transmitting and prep for
watchdog hang if we don't clear this in the next watchdog period.

Also in general it makes it easier to understand if these all work
with the same logic.

> > In addition for readability we may want consider adding an enum simliar
> > to the netdev_tx enum as then we have the return types locked and
> > usable should we want to specifically pick out one.
>
> Hm, I thought people generally dislike the netdev_tx enum.
> Maybe it's just me.

The thought I had with the enum is to more easily connect the outcomes
with the sources. It would also help to prevent any confusion on what
is what. Having the two stop/wake functions return different values is
a potential source for errors since 0/1 means different things in the
different functions. Basically since we have 3 possible outcomes using
the enum would make it very clear what the mapping is between the two.

> > > +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> > > +   ({                                                              \
> > > +           int _res;                                               \
> > > +                                                                   \
> > > +           if (likely(get_desc < start_thrs))                      \
> > > +                   _res = -1;                                      \
> > > +           else                                                    \
> > > +                   _res = __netif_tx_queue_try_wake(txq, get_desc, \
> > > +                                                    start_thrs,    \
> > > +                                                    down_cond);    \
> > > +           _res;                                                   \
> > > +   })
> > > +
> >
> > The likely here is probably not correct. In most cases the queue will
> > likely have enough descriptors to enable waking since Tx cleanup can
> > usually run pretty fast compared to the transmit path itself since it
> > can run without needing to take locks.
>
> Good catch, must be a copy paste from the other direction without
> inverting the likely.

Actually the original code had it there in ixgbe although the check
also included total_packets in it which implies that maybe the
assumption was that Tx cleanup wasn't occurring as often as Rx? Either
that or it was a carry-over and had a check for
__netif_subqueue_stopped included in there previously.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ