linux-kernel - Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.58.0711161830570.28002@gandalf.stny.rr.com>
Date:	Fri, 16 Nov 2007 18:37:05 -0500 (EST)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Remy Bohmer <linux@...mer.net>
cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	RT <linux-rt-users@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals




On Sat, 17 Nov 2007, Remy Bohmer wrote:

> Hello Steven,
>
> Thanks for your reply
>
> > The above sounds more like you need a completion.
> Funny, I first started with using completion structures, but that did
> not work either. I get similar OOPses on all these kind of locking
> mechanisms, as long as I use the _interruptible() type. I tried every
> work-around I can think of, but none worked :-((
> Even if I block on an ordinary rt-mutex in the same routine, wait a
> _interruptible() type, I get the same problem.

The taker of a mutex must also be the one that releases it.  I don't see
how you could use a mutex for this. It really requires some kind of
completion, or a compat_semaphore.

>
> > What's used to wake up the caller of down_interruptible?
> A call to up() is used from inside an interrupt(thread) context, but
> this is not relevant for the problem, because only blocking on a
> semaphore with down_interruptible() and waking the thread by CTRL-C is
> enough to get this Oops.
>
> I saw that the code is trying to wake 'other waiters', while there is
> only 1 thread waiting on the semaphore at most. I feel that the root
> cause of this problem has to be searched there.
>
> I believe that executing any PI code on semaphores is a strange
> behavior anyway, because a semaphore is never 'owned' by a thread, and
> it is always another thread that wakes the thread that blocks on a
> semaphore, and because the waker is unknown, the PI code will always
> boost the prio of the wrong thread.

Exactly why it should be a completion or compat semaphores. The reason we
did PI on semaphores is only because they were used as mutexes before Ingo
pushed to actually get a mutex primative into the kernel. Since then,
we've been trying to remove all semaphores with either a mutex or
completion.

>
> Strange is also, that I get different behavior on ARM if I use
> sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
> crash less, it will not crash until the first up(); while the first
> will crash even without any up().
>
> Attached I have put a sample driver I just hacked together a few
> minutes ago. It is NOT the driver that has generates the oops in the
> previous mail, but I have stripped a scull-driver down that much that
> it will be much easier to talk about, and to keep us focussed on the
> part of the code that is causing this.
> Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
> also OOPSes although slightly different than on ARM. See the attached
> dummy.txt file.
>
>
> Beware: The up(&sema) is NOT required to get this OOPS, I get it even
> without any up(&sema) !
>
> I hope you can look at the attached driver source and help me with this...
>

        down_interruptible(&dummy);
        printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
        down_interruptible(&dummy);


This double down is actually illegal with rt semaphores. Because we treat
semaphores as mutexes unless they are declared as compat_semaphores. In
which case we don't do PI.

Seems that you need to work out how to use a completion for your code. And
if that doesn't work, then use a compat_semaphore. But beware, that the
compat_semaphore can cause unbounded latencies. But then again, so can
completions.


-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/