[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3YH_76+5g29QF4Xp4gXJz5bwdQXD_gXv3esAVTgNGkXyg@mail.gmail.com>
Date: Tue, 10 May 2022 10:55:59 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Julian Anastasov <ja@....bg>
Cc: Simon Horman <horms@...ge.net.au>,
Pablo Neira Ayuso <pablo@...filter.org>,
netdev <netdev@...r.kernel.org>, lvs-devel@...r.kernel.org,
linux-kernel <linux-kernel@...r.kernel.org>,
netfilter-devel@...r.kernel.org,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net-next] net: ipvs: random start for RR scheduler
On Tue, May 10, 2022 at 2:17 AM Julian Anastasov <ja@....bg> wrote:
>
>
> Hello,
>
> On Mon, 9 May 2022, menglong8.dong@...il.com wrote:
>
> > From: Menglong Dong <imagedong@...cent.com>
> >
> > For now, the start of the RR scheduler is in the order of dest
> > service added, it will result in imbalance if the load balance
>
> ...order of added destinations,...
>
> > is done in client side and long connect is used.
>
> ..."long connections are used". Is this a case where
> small number of connections are used? And the two connections
> relatively overload the real servers?
>
> > For example, we have client1, client2, ..., client5 and real service
> > service1, service2, service3. All clients have the same ipvs config,
> > and each of them will create 2 long TCP connect to the virtual
> > service. Therefore, all the clients will connect to service1 and
> > service2, leaving service3 free.
>
> You mean, there are many IPVS directors with same
> config and each director gets 2 connections? Third connection
> will get real server #3, right ? Also, are the clients local
> to the director?
>
> > Fix this by randomize the start of dest service to RR scheduler when
>
> ..."randomizing the starting destination when"
>
Nice description :/
> > IP_VS_SVC_F_SCHED_RR_RANDOM is set.
> >
> > Signed-off-by: Menglong Dong <imagedong@...cent.com>
> > ---
> > include/uapi/linux/ip_vs.h | 2 ++
> > net/netfilter/ipvs/ip_vs_rr.c | 25 ++++++++++++++++++++++++-
> > 2 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> > index 4102ddcb4e14..7f74bafd3211 100644
> > --- a/include/uapi/linux/ip_vs.h
> > +++ b/include/uapi/linux/ip_vs.h
> > @@ -28,6 +28,8 @@
> > #define IP_VS_SVC_F_SCHED_SH_FALLBACK IP_VS_SVC_F_SCHED1 /* SH fallback */
> > #define IP_VS_SVC_F_SCHED_SH_PORT IP_VS_SVC_F_SCHED2 /* SH use port */
> >
> > +#define IP_VS_SVC_F_SCHED_RR_RANDOM IP_VS_SVC_F_SCHED1 /* random start */
> > +
> > /*
> > * Destination Server Flags
> > */
> > diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
> > index 38495c6f6c7c..e309d97bdd08 100644
> > --- a/net/netfilter/ipvs/ip_vs_rr.c
> > +++ b/net/netfilter/ipvs/ip_vs_rr.c
> > @@ -22,13 +22,36 @@
> >
> > #include <net/ip_vs.h>
> >
> > +static void ip_vs_rr_random_start(struct ip_vs_service *svc)
> > +{
> > + struct list_head *cur;
> > + u32 start;
> > +
> > + if (!(svc->flags | IP_VS_SVC_F_SCHED_RR_RANDOM) ||
>
> | -> &
>
> > + svc->num_dests <= 1)
> > + return;
> > +
> > + spin_lock_bh(&svc->sched_lock);
> > + start = get_random_u32() % svc->num_dests;
>
> May be prandom is more appropriate for non-crypto purposes.
> Also, not sure if it is a good idea to limit the number of steps,
> eg. to 128...
>
> start = prandom_u32_max(min(svc->num_dests, 128U));
>
Yeah, prandom_u32_max is a good choice, I'll use it instead.
> or just use
>
> start = prandom_u32_max(svc->num_dests);
>
> Also, this line can be before the spin_lock_bh.
>
> > + cur = &svc->destinations;
>
> cur = svc->sched_data;
>
> ... and to start from current svc->sched_data because
> we are called for every added dest. Better to jump 0..127 steps
> ahead, to avoid delay with long lists?
>
I'm a little afraid that the 'steps' may make the starting dest not
absolutely random, in terms of probability. For example, we have
256 services, and will the services in the middle have more chances
to be chosen as the start? It's just a feeling, I'm not good at
Probability :/
The delay that ip_vs_wrr_gcd_weight() caused is much more than
this case, so maybe the delay here can be ignored?
Thanks!
Menglong Dong
Powered by blists - more mailing lists