lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+4wbef3k6at_Kf+8MBmU4HhE9nxMRvROR_OxsZptffjA@mail.gmail.com>
Date: Mon, 23 Sep 2024 11:29:54 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: yushengjin <yushengjin@...ontech.com>
Cc: pablo@...filter.org, kadlec@...filter.org, roopa@...dia.com, 
	razor@...ckwall.org, davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	netfilter-devel@...r.kernel.org, coreteam@...filter.org, 
	bridge@...ts.linux.dev, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net/bridge: Optimizing read-write locks in ebtables.c

On Mon, Sep 23, 2024 at 11:16 AM yushengjin <yushengjin@...ontech.com> wrote:
>
> When conducting WRK testing, the CPU usage rate of the testing machine was
> 100%. forwarding through a bridge, if the network load is too high, it may
> cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
> to excessive soft interrupts and sometimes even directly causing CPU soft
> deadlocks.
>
> After analysis, it was found that the code of ebtables had not been optimized
> for a long time, and the read-write locks inside still existed. However, other
> arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
> in read-write locks had been discovered a long time ago.
>
> Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/
>
> So I referred to arp/ip/ip6 modification methods to optimize the read-write
> lock in ebtables.c.
>
> test method:
> 1) Test machine creates bridge :
> ``` bash
> brctl addbr br-a
> brctl addbr br-b
> brctl addif br-a enp1s0f0 enp1s0f1
> brctl addif br-b enp130s0f0 enp130s0f1
> ifconfig br-a up
> ifconfig br-b up
> ```
> 2) Testing with another machine:
> ``` bash
> ulimit -n 2048
> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html &
> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html &
> ```
>
> Signed-off-by: yushengjin <yushengjin@...ontech.com>
> ---
>  include/linux/netfilter_bridge/ebtables.h |  47 +++++++-
>  net/bridge/netfilter/ebtables.c           | 132 ++++++++++++++++------
>  2 files changed, 145 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
> index fd533552a062..dd52dea20fb8 100644
> --- a/include/linux/netfilter_bridge/ebtables.h
> +++ b/include/linux/netfilter_bridge/ebtables.h
> @@ -93,7 +93,6 @@ struct ebt_table {
>         char name[EBT_TABLE_MAXNAMELEN];
>         struct ebt_replace_kernel *table;
>         unsigned int valid_hooks;
> -       rwlock_t lock;
>         /* the data used by the kernel */
>         struct ebt_table_info *private;
>         struct nf_hook_ops *ops;
> @@ -124,4 +123,50 @@ static inline bool ebt_invalid_target(int target)
>
>  int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net));
>  void ebt_unregister_template(const struct ebt_table *t);
> +
> +/**
> + * ebt_recseq - recursive seqcount for netfilter use
> + *
> + * Packet processing changes the seqcount only if no recursion happened
> + * get_counters() can use read_seqcount_begin()/read_seqcount_retry(),
> + * because we use the normal seqcount convention :
> + * Low order bit set to 1 if a writer is active.
> + */
> +DECLARE_PER_CPU(seqcount_t, ebt_recseq);
> +
> +/**
> + * ebt_write_recseq_begin - start of a write section
> + *
> + * Begin packet processing : all readers must wait the end
> + * 1) Must be called with preemption disabled
> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> + * Returns :
> + *  1 if no recursion on this cpu
> + *  0 if recursion detected
> + */
> +static inline unsigned int ebt_write_recseq_begin(void)
> +{
> +       unsigned int addend;
> +
> +       addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1;
> +
> +       __this_cpu_add(ebt_recseq.sequence, addend);
> +       smp_mb();
> +
> +       return addend;
> +}
> +
> +/**
> + * ebt_write_recseq_end - end of a write section
> + * @addend: return value from previous ebt_write_recseq_begin()
> + *
> + * End packet processing : all readers can proceed
> + * 1) Must be called with preemption disabled
> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> + */
> +static inline void ebt_write_recseq_end(unsigned int addend)
> +{
> +       smp_wmb();
> +       __this_cpu_add(ebt_recseq.sequence, addend);
> +}

Why not reusing xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end(),
instead of copy/pasting them ?

This was added in

commit 7f5c6d4f665bb57a19a34ce1fb16cc708c04f219    netfilter: get rid
of atomic ops in fast path

If this is an include mess, just move them in a separate include file.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ