linux-kernel - Re: [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1801311644120.9911@nanos.tec.linutronix.de>
Date:   Wed, 31 Jan 2018 17:01:56 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
cc:     longman@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch
 outside the loop

On Thu, 25 Jan 2018, Yang Shi wrote:

> There are nested loops on debug objects free path, sometimes it may take
> over hundred thousands of loops, then cause soft lockup with
> !CONFIG_PREEMPT occasionally, like below:
> 
> NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342]
> 
>  CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
> E   4.9.44-003.ali3000.alios7.x86_64.debug #1
>  Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
> 1.6.0 03/07/2013
>  task: ffff884cbb0d0000 task.stack: ffff884cabc70000
>  RIP: 0010:[<ffffffff817d647b>]  [<ffffffff817d647b>]
> _raw_spin_unlock_irqrestore+0x3b/0x60
>  RSP: 0018:ffff884cabc77b78  EFLAGS: 00000292
>  RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000
>  RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292
>  RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8
>  R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc
>  FS:  00002aee845fd2c0(0000) GS:ffff8852bd400000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0
>  Stack:
>   ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e
>   0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006
>   ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c

Please trim that. The register content is not really interesting

> @@ -186,11 +199,25 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
>  static void free_obj_work(struct work_struct *work)
>  {
>  	struct debug_obj *objs[ODEBUG_FREE_BATCH];
> +	struct hlist_node *tmp;
> +	struct debug_obj *obj;
>  	unsigned long flags;
>  	int i;
>  
>  	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
>  		return;
> +
> +	/* Move free obj to pool list from global free list */

If the pool is full, why would you shuffle those objects to the pool list
first just to free them 10 lines further down?

> +	if (obj_free > 0) {

  obj_nr_tofree might be a more descriptive name for this variable

> +		hlist_for_each_entry_safe(obj, tmp, &obj_to_free, node) {
> +			hlist_del(&obj->node);
> +			hlist_add_head(&obj->node, &obj_pool);
> +			obj_pool_free++;
> +			obj_pool_used--;
> +			obj_free--;
> +		}
> +	}

The other thing here is that this whole list walk operation happens with
the pool lock held and interrupts disabled. That's suboptimal at best.

So the right thing to do here is:

	HLIST_HEAD(tofree);
		
  	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
  		return;

	while (obj_pool_free < debug_objects_pool_size) {
		if (!obj_nr_tofree)
			break:
		hlist_del(...)
		hlist_add(...)
	}

	if (obj_nr_tofree) {
		hlist_move_list(&obj_to_free, &freelist);
		....
	}

	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
	      ....
	}

	raw_spin_unlock_irqrestore(&pool_lock, flags);

	hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
		hlist_del();
		kmem_cache_free(....);
	}

That way you minimize the lock held times and spare pointless list walks
and shuffling.

Thanks,

	tglx