[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb8eaad0-83c5-a150-d830-e078682ba18b@ssi.bg>
Date: Mon, 9 May 2022 21:17:25 +0300 (EEST)
From: Julian Anastasov <ja@....bg>
To: menglong8.dong@...il.com
cc: Simon Horman <horms@...ge.net.au>, pablo@...filter.org,
netdev@...r.kernel.org, lvs-devel@...r.kernel.org,
linux-kernel <linux-kernel@...r.kernel.org>,
netfilter-devel@...r.kernel.org,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net-next] net: ipvs: random start for RR scheduler
Hello,
On Mon, 9 May 2022, menglong8.dong@...il.com wrote:
> From: Menglong Dong <imagedong@...cent.com>
>
> For now, the start of the RR scheduler is in the order of dest
> service added, it will result in imbalance if the load balance
...order of added destinations,...
> is done in client side and long connect is used.
..."long connections are used". Is this a case where
small number of connections are used? And the two connections
relatively overload the real servers?
> For example, we have client1, client2, ..., client5 and real service
> service1, service2, service3. All clients have the same ipvs config,
> and each of them will create 2 long TCP connect to the virtual
> service. Therefore, all the clients will connect to service1 and
> service2, leaving service3 free.
You mean, there are many IPVS directors with same
config and each director gets 2 connections? Third connection
will get real server #3, right ? Also, are the clients local
to the director?
> Fix this by randomize the start of dest service to RR scheduler when
..."randomizing the starting destination when"
> IP_VS_SVC_F_SCHED_RR_RANDOM is set.
>
> Signed-off-by: Menglong Dong <imagedong@...cent.com>
> ---
> include/uapi/linux/ip_vs.h | 2 ++
> net/netfilter/ipvs/ip_vs_rr.c | 25 ++++++++++++++++++++++++-
> 2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> index 4102ddcb4e14..7f74bafd3211 100644
> --- a/include/uapi/linux/ip_vs.h
> +++ b/include/uapi/linux/ip_vs.h
> @@ -28,6 +28,8 @@
> #define IP_VS_SVC_F_SCHED_SH_FALLBACK IP_VS_SVC_F_SCHED1 /* SH fallback */
> #define IP_VS_SVC_F_SCHED_SH_PORT IP_VS_SVC_F_SCHED2 /* SH use port */
>
> +#define IP_VS_SVC_F_SCHED_RR_RANDOM IP_VS_SVC_F_SCHED1 /* random start */
> +
> /*
> * Destination Server Flags
> */
> diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
> index 38495c6f6c7c..e309d97bdd08 100644
> --- a/net/netfilter/ipvs/ip_vs_rr.c
> +++ b/net/netfilter/ipvs/ip_vs_rr.c
> @@ -22,13 +22,36 @@
>
> #include <net/ip_vs.h>
>
> +static void ip_vs_rr_random_start(struct ip_vs_service *svc)
> +{
> + struct list_head *cur;
> + u32 start;
> +
> + if (!(svc->flags | IP_VS_SVC_F_SCHED_RR_RANDOM) ||
| -> &
> + svc->num_dests <= 1)
> + return;
> +
> + spin_lock_bh(&svc->sched_lock);
> + start = get_random_u32() % svc->num_dests;
May be prandom is more appropriate for non-crypto purposes.
Also, not sure if it is a good idea to limit the number of steps,
eg. to 128...
start = prandom_u32_max(min(svc->num_dests, 128U));
or just use
start = prandom_u32_max(svc->num_dests);
Also, this line can be before the spin_lock_bh.
> + cur = &svc->destinations;
cur = svc->sched_data;
... and to start from current svc->sched_data because
we are called for every added dest. Better to jump 0..127 steps
ahead, to avoid delay with long lists?
> + while (start--)
> + cur = cur->next;
> + svc->sched_data = cur;
> + spin_unlock_bh(&svc->sched_lock);
> +}
>
> static int ip_vs_rr_init_svc(struct ip_vs_service *svc)
> {
> svc->sched_data = &svc->destinations;
> + ip_vs_rr_random_start(svc);
> return 0;
> }
>
> +static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
> +{
> + ip_vs_rr_random_start(svc);
> + return 0;
> +}
>
> static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
> {
> @@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = {
> .module = THIS_MODULE,
> .n_list = LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list),
> .init_service = ip_vs_rr_init_svc,
> - .add_dest = NULL,
> + .add_dest = ip_vs_rr_add_dest,
> .del_dest = ip_vs_rr_del_dest,
> .schedule = ip_vs_rr_schedule,
> };
> --
> 2.36.0
Regards
--
Julian Anastasov <ja@....bg>
Powered by blists - more mailing lists