linux-kernel - RE: semaphore and mutex in current Linux kernel (3.2.2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <491D6B4EAD0A714894D8AD22F4BDE0439F98F4@SCYBEXDAG02.amd.com>
Date:	Thu, 5 Apr 2012 08:37:46 +0000
From:	"Chen, Dennis (SRDC SW)" <Dennis1.Chen@....com>
To:	Ingo Molnar <mingo@...nel.org>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mingo@...hat.com" <mingo@...hat.com>
Subject: RE: semaphore and mutex in current Linux kernel (3.2.2)

On Tue, Apr 3, 2012 at 3:52 PM, Ingo Molnar <mingo@...nel.org> wrote:
> I'm not sure what the point of comparative measurements with
> semaphores would be: for example we don't have per architecture
> optimized semaphores anymore, we switched the legacy semaphores
> to a generic version and are phasing them out.
>

About the point, very simple, I am very curious about the mutex performance 
optimization (actually I am curious about almost everything in the kernel :)
I know that the rationale of the mutex's optimization is, if the lock owner is 
running, it's likely to release the lock soon. So make the waiter to spin a 
short time waiting for the lock to be released is reasonable given the workload 
of a process switch.

But how about if the lock owner running doesn't release the lock soon? Will it
degrade the mutex performance comparing with semaphore?
Look at the below code in mutex slow path:

int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
{
	if (!sched_feat(OWNER_SPIN))
		return 0;

	rcu_read_lock();
	while (owner_running(lock, owner)) {
		if (need_resched())
			break;

		arch_mutex_cpu_relax();
	}
	rcu_read_unlock();

	/*
	 * We break out the loop above on need_resched() and when the
	 * owner changed, which is a sign for heavy contention. Return
	 * success only when lock->owner is NULL.
	 */
	return lock->owner == NULL;
}

According to the codes, I guess probably the waiter will busy wait in the while loop. 
So I made an experiment:
1. write a simple character device kernel module. Add a busy wait for 2 minutes 
between the mutex_lock/mutex_unlock in its read function like this:

xxx_read(){
    /* 2 minutes */
    unsigned long j = jiffies + 120 * HZ; 
    mutex_lock(&c_mutex);
    while(time_before(jiffies, j))
        cpu_relax();
    mutex_unlock(&c_mutex);
}

2. write an Application to open and read this device, I startup the App in 2 
different CPUs almost the same time:
# taskset 0x00000001 ./main
# taskset 0x00000002 ./main

The App in CPU0 will get the mutex lock and running about 2 minutes to release the lock, 
according the mutex_spin_on_owner(), I guess the App in CPU1 will run in the while loop, 
but the ps command output:

root     30197  0.0  0.0   4024   324 pts/2    R+   11:30   0:00 ./main
root     30198  0.0  0.0   4024   324 pts/0    D+   11:30   0:00 ./main

D+ means the App in CPU1 is sleeping in a UNINTERRUPTIBLE state. This is very interesting,
How does this happen? I check my kernel config, the CONFIG_MUTEX_SPIN_ON_OWNER is 
Set and '/sys/kernel/debug# cat sched_features' output:
... ICK LB_BIAS OWNER_SPIN NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP

I know I must have some misunderstanding of the code, but I don't know where it is...

> Mutexes have various advantages (such as lockdep coverage and in
> general tighter semantics that makes their usage more robust)

Yes, I agree.

> and we aren't going back to semaphores.
>
> What would make a ton of sense would be to create a 'perf bench'
> module that would use the kernel's mutex code and would measure
> it in user-space. 'perf bench mem' already does a simplified
>
> So if you'd be interested in writing that brand new benchmarking
> feature and need help then let the perf people know.

Thanks Ingo for the info, it's already in my TODO list of the benchmarking feature for
some kernel feature...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/