linux-kernel - Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yyss3SWM0nTVnjT7@alley>
Date:   Wed, 21 Sep 2022 17:25:17 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Zhen Lei <thunder.leizhen@...wei.com>
Cc:     Josh Poimboeuf <jpoimboe@...nel.org>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        Joe Lawrence <joe.lawrence@...hat.com>,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
        Masahiro Yamada <masahiroy@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Luis Chamberlain <mcgrof@...nel.org>,
        linux-modules@...r.kernel.org
Subject: Re: [PATCH v4 4/8] kallsyms: Improve the performance of
 kallsyms_lookup_name()

On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. This process can be optimized.
> 
> And now scripts/kallsyms no longer compresses the symbol types, each
> symbol type always occupies one byte. So we can first compress the
> searched symbol and then make a quick comparison based on the compressed
> length and content. In this way, for entries with mismatched lengths,
> there is no need to expand and compare strings. And for those matching
> lengths, there's no need to expand the symbol. This saves a lot of time.
> According to my test results, the average performance of
> kallsyms_lookup_name() can be improved by 20 to 30 times.
> 
> The pseudo code of the test case is as follows:
> static int stat_find_name(...)
> {
> 	start = sched_clock();
> 	(void)kallsyms_lookup_name(name);
> 	end = sched_clock();
> 	//Update min, max, cnt, sum
> }
> 
> /*
>  * Traverse all symbols in sequence and collect statistics on the time
>  * taken by kallsyms_lookup_name() to lookup each symbol.
>  */
> kallsyms_on_each_symbol(stat_find_name, NULL);
> 
> The test results are as follows (twice):
> After : min=5250, max=  726560, avg= 302132
> After : min=5320, max=  726850, avg= 301978
> Before: min=170,  max=15949190, avg=7553906
> Before: min=160,  max=15877280, avg=7517784
> 
> The average time consumed is only 4.01% and the maximum time consumed is
> only 4.57% of the time consumed before optimization.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
> ---
>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>  	return off;
>  }
>  
> +static int kallsyms_name_to_tokens(const char *name, char *buf)

This is not safe API. It is always needed to pass the size of the
buffer.

Also it should be called "compress". "token" is just an implementation
detail.

I would do:

static int kallsyms_compress_symbol_name(const char *name,
					 char *buf, size_t size)


> +{
> +	int i, j, k, n;
> +	int len, token_len;
> +	const char *token;
> +	unsigned char token_idx[KSYM_NAME_LEN];
> +	unsigned char token_bak[KSYM_NAME_LEN];

Why do we need two buffers? It should be possible to compress the name
in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.

> +
> +	/*
> +	 * n, number of tokens in the string name.
> +	 * token_idx[i], the start index of the ith token.
> +	 * token_idx[n] is used to calculate the length of the last token.
> +	 */
> +	n = strlen(name);
> +	if (n >= KSYM_NAME_LEN) {
> +		buf[0] = 0;
> +		return 0;
> +	}
> +	for (i = 0; i <= n; i++)
> +		token_idx[i] = (unsigned char)i;
> +
> +	/*
> +	 * For tokens whose token_len >= 2, a larger index value indicates
> +	 * a higher occurrence frequency. See scripts/kallsyms.c
> +	 */
> +	for (i = 255; i >= 0; i--) {
> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
> +		token_len = strlen(token);
> +		if (token_len <= 1)
> +			continue;
> +
> +		/*
> +		 * Find and merge two tokens into one.
> +		 *
> +		 *                |<-- new_token -->|
> +		 *                | token1 | token2 |
> +		 * token_idx[]:   j       j+1      j+2
> +		 */
> +		for (j = 0; j < n - 1; j++) {
> +			len = token_idx[j + 2] - token_idx[j];
> +			if (len == token_len &&
> +			    !strncmp(name + token_idx[j], token, len)) {
> +				token_bak[token_idx[j]] = (unsigned char)i;
> +				for (k = j + 1; k < n; k++)
> +					token_idx[k] = token_idx[k + 1];
> +				n--;
> +			}
> +		}
> +	}
> +
> +	for (j = 0; j < n; j++) {
> +		len = token_idx[j + 1] - token_idx[j];
> +		if (len <= 1) {
> +			buf[j] = name[token_idx[j]];
> +			continue;
> +		}
> +
> +		buf[j] = token_bak[token_idx[j]];

Maybe, I do not understand the compression format correctly but
this code looks too complicated. Honestly, I even did not try to
understand it.

My understanding is the we just need to find all tokens and
replace them with index.

It should be even easier than compress_symbols() in scripts/callsyms.c.
The token_table already exists and we do not need to handle the token_profit...

The following looks more strigtforward (not even compile tested):

	ssize_t len, size;

	len = strscpy(buf, symname, size);
	if (WARN_ON_ONCE(len < 0))
		return -EINVAL;

	/* the tokens with higher index are used first */
	for (idx = 255; idx >= 0; idx--) {
		token =	&kallsyms_token_table[kallsyms_token_index[i]];
		token_len = strlen(token);

		p1 = buf;
		/* length of the remaining symname including the trailing '\0' */
		remaining = len + 1;

		/* find the token in the symbol name */
		p2 = strstr(token, p1);

		while (p2) {
			/* replace token with index */
			*p2 = idx;
			remaining -= ((p2 - p1) + token_len);
			memmove(p2 + 1, p2 + token_len, remaining);
			len -= (token_len - 1);
			p1 = p2;

			/* find the token in the rest of the symbol name */
			p2 = strstr(token, p1);
		}
	}

	return len;

> +	}
> +	buf[n] = 0;
> +
> +	return n;
> +}
> +
>  /*
>   * Get symbol type information. This is encoded as a single char at the
>   * beginning of the symbol name.
> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>  	char namebuf[KSYM_NAME_LEN];
>  	unsigned long i;
>  	unsigned int off;
> +	int len;
>  
>  	/* Skip the search for empty string. */
>  	if (!*name)
>  		return 0;
>  
> +	len = kallsyms_name_to_tokens(name, namebuf);
> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> +		if (kallsyms_names[off] == len + 1 &&
> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
> +			return kallsyms_sym_address(i);
> +
> +		off += kallsyms_names[off] + 1;

These complicated checks are hard to review. The following looks much
more readable to me:

	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
		/* the stored symbol name is prefixed by symbol type */
		name_len = kallsyms_names[off] - 1;
		name = &kallsyms_names[off + 2];
		off += name_len + 2;

		if (name_len != len)
			continue;

		if (!memcmp(name, namebuf, len))
			return kallsyms_sym_address(i);
	}


> +	}
> +
>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>  
> -		if (strcmp(namebuf, name) == 0)
> -			return kallsyms_sym_address(i);
> -
>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>  			return kallsyms_sym_address(i);

Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
It might actually slow down the search because we would need to use
both fast and slow search?

Best Regards,
Petr