lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181009161336.1abf11d48a9adf40486c30eb@linux-foundation.org>
Date:   Tue, 9 Oct 2018 16:13:36 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Jack Wang <xjtuwjp@...il.com>
Cc:     gregkh@...uxfoundation.org, florian-ewald.mueller@...fitbricks.com,
        linux-kernel@...r.kernel.org,
        Jack Wang <jinpu.wang@...fitbricks.com>
Subject: Re: [PATCH] lib: memcmp optimization

On Tue,  9 Oct 2018 16:28:11 +0200 Jack Wang <xjtuwjp@...il.com> wrote:

> From: Florian-Ewald Mueller <florian-ewald.mueller@...fitbricks.com>
> 
> During testing, I have configured 128 md/raid1's and, while under
> heavy IO, I started a check on each of them
> (echo check > /sys/block/mdx/md/sync_action).
> 
> The CPU utilization went through the ceiling and when looking for
> the cause (with 'perf top'). I've discovered that ~50% of the time
> was spend in memcmp() called from process_checks().
> 
> With this patch applied, it drops to 4% - 10%.

Which CPU architecture?  Most important architectures appear to define
__HAVE_ARCH_MEMCMP.

> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -852,7 +852,7 @@ EXPORT_SYMBOL(memmove);
>   * @count: The size of the area.
>   */
>  #undef memcmp
> -__visible int memcmp(const void *cs, const void *ct, size_t count)
> +static inline int __memcmp(const void *cs, const void *ct, size_t count)

What the heck does __visible do?

>  {
>  	const unsigned char *su1, *su2;
>  	int res = 0;
> @@ -862,6 +862,20 @@ __visible int memcmp(const void *cs, const void *ct, size_t count)
>  			break;
>  	return res;
>  }
> +__visible int memcmp(const void *cs, const void *ct, size_t count)
> +{
> +	const uint64_t *l1p = cs;
> +	const uint64_t *l2p = ct;
> +
> +	while (count >= sizeof(*l1p)) {
> +		if (*l1p != *l2p)
> +			return __memcmp(l1p, l2p, sizeof(*l1p));
> +		count -= sizeof(*l1p);
> +		++l1p;
> +		++l2p;
> +	}
> +	return __memcmp(l1p, l2p, count);
> +}

This is going to do bad things if the incoming addresses aren't
suitably aligned.

Certainly, byte-at-a-time is a pretty lame implementation when the
addresses are suitably aligned.  A fallback to the lame version when
there is misalignment will be simple to do.  And presumably there will
be decent benefits to whoever is actually using this code.  But I'm
wondering who is actually using this code!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ