linux-kernel - Re: [PATCH v3 1/2] kunit: add a KUnit test for SLUB debugging functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YGWPdFywfNUl4d3S@elver.google.com>
Date:   Thu, 1 Apr 2021 11:16:36 +0200
From:   Marco Elver <elver@...gle.com>
To:     glittao@...il.com
Cc:     cl@...ux.com, penberg@...nel.org, rientjes@...gle.com,
        iamjoonsoo.kim@....com, akpm@...ux-foundation.org, vbabka@...e.cz,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kunit-dev@...glegroups.com
Subject: Re: [PATCH v3 1/2] kunit: add a KUnit test for SLUB debugging
 functionality

[Note, if you'd like me to see future versions, please Cc me, otherwise
it's unlikely I see it in time. Also add kunit-dev@...glegroups.com if
perhaps a KUnit dev should have another look, too.]

On Wed, Mar 31, 2021 at 10:51AM +0200, glittao@...il.com wrote:
> From: Oliver Glitta <glittao@...il.com>
> 
> SLUB has resiliency_test() function which is hidden behind #ifdef
> SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody
> runs it. KUnit should be a proper replacement for it.
> 
> Try changing byte in redzone after allocation and changing
> pointer to next free node, first byte, 50th byte and redzone
> byte. Check if validation finds errors.
> 
> There are several differences from the original resiliency test:
> Tests create own caches with known state instead of corrupting
> shared kmalloc caches.
> 
> The corruption of freepointer uses correct offset, the original
> resiliency test got broken with freepointer changes.
> 
> Scratch changing random byte test, because it does not have
> meaning in this form where we need deterministic results.
> 
> Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.
> 
> Add a counter field "errors" to struct kmem_cache to count number
> of errors detected in cache.
> 
> Silence bug report in SLUB test. Add SLAB_SILENT_ERRORS debug flag.
> Add SLAB_SILENT_ERRORS flag to SLAB_NEVER_MERGE, SLAB_DEBUG_FLAGS,
> SLAB_FLAGS_PERMITTED macros.
> 
> Signed-off-by: Oliver Glitta <glittao@...il.com>
> ---
> Changes since v2
> 
> Use bit operation & instead of logical && as reported by kernel test 
> robot and Dan Carpenter
> 
> Changes since v1
> 
> Conversion from kselftest to KUnit test suggested by Marco Elver.
> Error silencing.
> Error counting improvements. 
> 
>  include/linux/slab.h     |   2 +
>  include/linux/slub_def.h |   2 +
>  lib/Kconfig.debug        |   5 ++
>  lib/Makefile             |   1 +
>  lib/test_slub.c          | 124 +++++++++++++++++++++++++++++++++++++++
>  mm/slab.h                |   7 ++-
>  mm/slab_common.c         |   2 +-
>  mm/slub.c                |  64 +++++++++++++-------
>  8 files changed, 184 insertions(+), 23 deletions(-)
>  create mode 100644 lib/test_slub.c
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 7ae604076767..ed1a5a64d028 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -25,6 +25,8 @@
>   */
>  /* DEBUG: Perform (expensive) checks on alloc/free */
>  #define SLAB_CONSISTENCY_CHECKS	((slab_flags_t __force)0x00000100U)
> +/* DEBUG: Silent bug reports */
> +#define SLAB_SILENT_ERRORS	((slab_flags_t __force)0x00000200U)

This flag wouldn't be necessary if you do the design using
kunit_resource (see below).

(But perhaps I missed a conversation that said that this flag is
generally useful, but if so, it should probably be in a separate patch
justifying why it is required beyond the test.)

>  /* DEBUG: Red zone objs in a cache */
>  #define SLAB_RED_ZONE		((slab_flags_t __force)0x00000400U)
>  /* DEBUG: Poison objects */
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index dcde82a4434c..e4b51bb5bb83 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -133,6 +133,8 @@ struct kmem_cache {
>  	unsigned int usersize;		/* Usercopy region size */
>  
>  	struct kmem_cache_node *node[MAX_NUMNODES];
> +
> +	int errors;			/* Number of errors in cache */

So, I think it's bad design to add a new field 'errors', just for the
test. This will increase kmem_cache size for all builds, which is
unnecessary.

Is there use to retrieve 'errors' elsewhere?

While you could guard this with #ifdef CONFIG_SLUB_DEBUG or so, there's
a better design option if this is just for the KUnit test's benefit: use
kunit_resource.

The way it'd work is that for each test (you can add a common init
function) you add a named resource, in this case just an 'int' I guess,
that slab would be able to retrieve if this test is being run.

In the test somewhere, you could add something like this:


	static struct kunit_resource resource;
	static int slab_errors;

	..........

	static int test_init(struct kunit *test)
	{
		slab_errors = 0;
		kunit_add_named_resource(test, NULL, NULL, &resource,
					 "slab_errors", &slab_errors);
		return 0;
	}

	...... tests now check slab_errors .....

and then in slub.c you'd have:

	#if IS_ENABLED(CONFIG_KUNIT)
	static bool slab_add_kunit_errors(void)
	{
		struct kunit_resource *resource;

		if (likely(!current->kunit_test))
			return false;
		resource = kunit_find_named_resource(current->kunit_test, "slab_errors");
		if (!resource)
			return false;
		(*(int *)resource->data)++;
		kunit_put_resource(resource);
	}
	#else
	static inline bool slab_add_kunit_errors(void) { return false; }
	#endif

And anywhere you want to increase the error count, you'd call
slab_add_kunit_errors().

Another benefit of this approach is that if KUnit is disabled, there is
zero overhead and no additional code generated (vs. the current
approach).

>  };
>  
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2779c29d9981..e0dec830c269 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2371,6 +2371,11 @@ config BITS_TEST
>  
>  	  If unsure, say N.
>  
> +config SLUB_KUNIT_TEST
> +	tristate "KUnit Test for SLUB cache error detection" if !KUNIT_ALL_TESTS
> +	depends on SLUB_DEBUG && KUNIT
> +	default KUNIT_ALL_TESTS

Help text please.

> +
>  config TEST_UDELAY
>  	tristate "udelay test driver"
>  	help
> diff --git a/lib/Makefile b/lib/Makefile
> index b5307d3eec1a..e1eb986c0e87 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -352,5 +352,6 @@ obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
>  obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
>  obj-$(CONFIG_BITS_TEST) += test_bits.o
>  obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o
> +obj-$(CONFIG_SLUB_KUNIT_TEST) += test_slub.o

Any reason to not name this slub_kunit.c? This is a new test, and it
could follow the naming convention of the other KUnit tests.

Some of it is also documented:
https://www.kernel.org/doc/html/latest/dev-tools/kunit/style.html

>  obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
> diff --git a/lib/test_slub.c b/lib/test_slub.c
> new file mode 100644
> index 000000000000..4f8ea3c7d867
> --- /dev/null
> +++ b/lib/test_slub.c
> @@ -0,0 +1,124 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <kunit/test.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include "../mm/slab.h"
> +
> +
> +static void test_clobber_zone(struct kunit *test)
> +{
> +	struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_alloc", 64, 0,
> +				SLAB_RED_ZONE | SLAB_SILENT_ERRORS, NULL);
> +	u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +	p[64] = 0x12;
> +
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +	kmem_cache_free(s, p);
> +	kmem_cache_destroy(s);
> +}
> +
> +static void test_next_pointer(struct kunit *test)
> +{
> +	struct kmem_cache *s = kmem_cache_create("TestSlub_next_ptr_free", 64, 0,
> +				SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +	u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +	unsigned long tmp;
> +	unsigned long *ptr_addr;
> +
> +	kmem_cache_free(s, p);
> +
> +	ptr_addr = (unsigned long *)(p + s->offset);
> +	tmp = *ptr_addr;
> +	p[s->offset] = 0x12;
> +
> +	/*
> +	 * Expecting two errors.
> +	 * One for the corrupted freechain and the other one for the wrong
> +	 * count of objects in use.
> +	 */
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 2, s->errors);
> +
> +	/*
> +	 * Try to repair corrupted freepointer.
> +	 * Still expecting one error for the wrong count of objects in use.
> +	 */
> +	*ptr_addr = tmp;
> +
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +	/*
> +	 * Previous validation repaired the count of objects in use.
> +	 * Now expecting no error.
> +	 */
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 0, s->errors);
> +
> +	kmem_cache_destroy(s);
> +}
> +
> +static void test_first_word(struct kunit *test)
> +{
> +	struct kmem_cache *s = kmem_cache_create("TestSlub_1th_word_free", 64, 0,
> +				SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +	u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +	kmem_cache_free(s, p);
> +	*p = 0x78;
> +
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +	kmem_cache_destroy(s);
> +}
> +
> +static void test_clobber_50th_byte(struct kunit *test)
> +{
> +	struct kmem_cache *s = kmem_cache_create("TestSlub_50th_word_free", 64, 0,
> +				SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +	u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +	kmem_cache_free(s, p);
> +	p[50] = 0x9a;
> +
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 1, s->errors);
> +	kmem_cache_destroy(s);
> +}
> +
> +static void test_clobber_redzone_free(struct kunit *test)
> +{
> +	struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_free", 64, 0,
> +				SLAB_RED_ZONE | SLAB_SILENT_ERRORS, NULL);
> +	u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +	kmem_cache_free(s, p);
> +	p[64] = 0xab;
> +
> +	validate_slab_cache(s);
> +	KUNIT_EXPECT_EQ(test, 1, s->errors);
> +	kmem_cache_destroy(s);
> +}
> +
> +static struct kunit_case test_cases[] = {
> +	KUNIT_CASE(test_clobber_zone),
> +	KUNIT_CASE(test_next_pointer),
> +	KUNIT_CASE(test_first_word),
> +	KUNIT_CASE(test_clobber_50th_byte),
> +	KUNIT_CASE(test_clobber_redzone_free),
> +	{}
> +};
> +
> +static struct kunit_suite test_suite = {
> +	.name = "slub_test",
> +	.test_cases = test_cases,

Here you could add a 

	.init = test_init,

or so, if you go with the kunit_resource design.

> +};
> +kunit_test_suite(test_suite);
> +
> +MODULE_LICENSE("GPL");
> diff --git a/mm/slab.h b/mm/slab.h
> index 076582f58f68..382507b6cab9 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -134,7 +134,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
>  #elif defined(CONFIG_SLUB_DEBUG)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> -			  SLAB_TRACE | SLAB_CONSISTENCY_CHECKS)
> +			  SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | \
> +			  SLAB_SILENT_ERRORS)
>  #else
>  #define SLAB_DEBUG_FLAGS (0)
>  #endif
> @@ -164,7 +165,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  			      SLAB_NOLEAKTRACE | \
>  			      SLAB_RECLAIM_ACCOUNT | \
>  			      SLAB_TEMPORARY | \
> -			      SLAB_ACCOUNT)
> +			      SLAB_ACCOUNT | \
> +			      SLAB_SILENT_ERRORS)
>  
>  bool __kmem_cache_empty(struct kmem_cache *);
>  int __kmem_cache_shutdown(struct kmem_cache *);
> @@ -215,6 +217,7 @@ DECLARE_STATIC_KEY_TRUE(slub_debug_enabled);
>  DECLARE_STATIC_KEY_FALSE(slub_debug_enabled);
>  #endif
>  extern void print_tracking(struct kmem_cache *s, void *object);
> +long validate_slab_cache(struct kmem_cache *s);
>  #else
>  static inline void print_tracking(struct kmem_cache *s, void *object)
>  {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 88e833986332..239c9095e7ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -55,7 +55,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>   */
>  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
>  		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> -		SLAB_FAILSLAB | kasan_never_merge())
> +		SLAB_FAILSLAB | SLAB_SILENT_ERRORS | kasan_never_merge())
>  
>  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
>  			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> diff --git a/mm/slub.c b/mm/slub.c
> index 3021ce9bf1b3..7083e89432d0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -673,14 +673,16 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...)
>  
>  static void slab_fix(struct kmem_cache *s, char *fmt, ...)
>  {
> -	struct va_format vaf;
> -	va_list args;
> -
> -	va_start(args, fmt);
> -	vaf.fmt = fmt;
> -	vaf.va = &args;
> -	pr_err("FIX %s: %pV\n", s->name, &vaf);
> -	va_end(args);
> +	if (!(s->flags & SLAB_SILENT_ERRORS)) {

This style of guarding causes lots of line diffs.

Instead, if the whole function is to be skipped, please prefer:

	if (skip_things_if_true)
		return;

instead of

	if (!skip_things_if_true) {
		...
	}

here and everywhere else.

Specifically, for the kunit_resource design, this would simply become:

	if (slab_add_kunit_errors())
		return;

Meaning that whenever a KUnit test has a resource "slab_errors", no
messages are printed -- otherwise it'll always print a message.

> +		struct va_format vaf;
> +		va_list args;
> +
> +		va_start(args, fmt);
> +		vaf.fmt = fmt;
> +		vaf.va = &args;
> +		pr_err("FIX %s: %pV\n", s->name, &vaf);
> +		va_end(args);
> +	}
>  }
>  
>  static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
> @@ -739,8 +741,10 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
>  void object_err(struct kmem_cache *s, struct page *page,
>  			u8 *object, char *reason)
>  {
> -	slab_bug(s, "%s", reason);
> -	print_trailer(s, page, object);
> +	if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +		slab_bug(s, "%s", reason);
> +		print_trailer(s, page, object);
> +	}
>  }
>  
>  static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
> @@ -752,9 +756,11 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
>  	va_start(args, fmt);
>  	vsnprintf(buf, sizeof(buf), fmt, args);
>  	va_end(args);
> -	slab_bug(s, "%s", buf);
> -	print_page_info(page);
> -	dump_stack();
> +	if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +		slab_bug(s, "%s", buf);
> +		print_page_info(page);
> +		dump_stack();
> +	}
>  }
>  
>  static void init_object(struct kmem_cache *s, void *object, u8 val)
> @@ -798,11 +804,13 @@ static int check_bytes_and_report(struct kmem_cache *s, struct page *page,
>  	while (end > fault && end[-1] == value)
>  		end--;
>  
> -	slab_bug(s, "%s overwritten", what);
> -	pr_err("INFO: 0x%p-0x%p @offset=%tu. First byte 0x%x instead of 0x%x\n",
> +	if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +		slab_bug(s, "%s overwritten", what);
> +		pr_err("INFO: 0x%p-0x%p @offset=%tu. First byte 0x%x instead of 0x%x\n",
>  					fault, end - 1, fault - addr,
>  					fault[0], value);
> -	print_trailer(s, page, object);
> +		print_trailer(s, page, object);
> +	}
>  
>  	restore_bytes(s, what, value, fault, end);
>  	return 0;
> @@ -964,6 +972,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
>  
>  	if (!PageSlab(page)) {
>  		slab_err(s, page, "Not a valid slab page");
> +		s->errors += 1;

So to me it looks like errors is set whenever there's a slab_err() or
slab_fix() call. So why not move setting that an error occurred into
those functions?

>  		return 0;
>  	}
>  
> @@ -971,11 +980,13 @@ static int check_slab(struct kmem_cache *s, struct page *page)
>  	if (page->objects > maxobj) {
>  		slab_err(s, page, "objects %u > max %u",
>  			page->objects, maxobj);
> +		s->errors += 1;
>  		return 0;
>  	}
>  	if (page->inuse > page->objects) {
>  		slab_err(s, page, "inuse %u > max %u",
>  			page->inuse, page->objects);
> +		s->errors += 1;
>  		return 0;
>  	}
>  	/* Slab_pad_check fixes things up after itself */
> @@ -1008,8 +1019,10 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
>  				page->freelist = NULL;
>  				page->inuse = page->objects;
>  				slab_fix(s, "Freelist cleared");
> +				s->errors += 1;
>  				return 0;
>  			}
> +			s->errors += 1;
>  			break;
>  		}
>  		object = fp;
> @@ -1026,12 +1039,14 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
>  			 page->objects, max_objects);
>  		page->objects = max_objects;
>  		slab_fix(s, "Number of objects adjusted.");
> +		s->errors += 1;
>  	}
>  	if (page->inuse != page->objects - nr) {
>  		slab_err(s, page, "Wrong object count. Counter is %d but counted were %d",
>  			 page->inuse, page->objects - nr);
>  		page->inuse = page->objects - nr;
>  		slab_fix(s, "Object count adjusted.");
> +		s->errors += 1;
>  	}
>  	return search == NULL;
>  }
> @@ -4629,8 +4644,10 @@ static void validate_slab(struct kmem_cache *s, struct page *page)
>  		u8 val = test_bit(__obj_to_index(s, addr, p), map) ?
>  			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
>  
> -		if (!check_object(s, page, p, val))
> +		if (!check_object(s, page, p, val)) {
> +			s->errors += 1;
>  			break;
> +		}
>  	}
>  	put_map(map);
>  unlock:
> @@ -4650,9 +4667,11 @@ static int validate_slab_node(struct kmem_cache *s,
>  		validate_slab(s, page);
>  		count++;
>  	}
> -	if (count != n->nr_partial)
> +	if (count != n->nr_partial) {
>  		pr_err("SLUB %s: %ld partial slabs counted but counter=%ld\n",
>  		       s->name, count, n->nr_partial);
> +		s->errors += 1;
> +	}
>  
>  	if (!(s->flags & SLAB_STORE_USER))
>  		goto out;
> @@ -4661,20 +4680,23 @@ static int validate_slab_node(struct kmem_cache *s,
>  		validate_slab(s, page);
>  		count++;
>  	}
> -	if (count != atomic_long_read(&n->nr_slabs))
> +	if (count != atomic_long_read(&n->nr_slabs)) {
>  		pr_err("SLUB: %s %ld slabs counted but counter=%ld\n",
>  		       s->name, count, atomic_long_read(&n->nr_slabs));
> +		s->errors += 1;

I think these here are a few cases where there's no slab_err() or
slab_fix() call, and open coding setting errors is still required...

> +	}
>  
>  out:
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  	return count;
>  }
>  
> -static long validate_slab_cache(struct kmem_cache *s)
> +long validate_slab_cache(struct kmem_cache *s)
>  {
>  	int node;
>  	unsigned long count = 0;
>  	struct kmem_cache_node *n;
> +	s->errors = 0;

This would also go away if you use the kunit_resource design. I also
think it's better if the test fully controls 'slab_errors', and it isn't
reset by this function.

>  	flush_all(s);
>  	for_each_kmem_cache_node(s, node, n)
> @@ -4682,6 +4704,8 @@ static long validate_slab_cache(struct kmem_cache *s)
>  
>  	return count;
>  }
> +EXPORT_SYMBOL(validate_slab_cache);
> +
>  /*
>   * Generate lists of code addresses where slabcache objects are allocated
>   * and freed.
> -- 
> 2.17.1