linux-kernel - Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3efb10970711161502m6216bf5rc19a34184b4f3a2b@mail.gmail.com>
Date:	Sat, 17 Nov 2007 00:02:39 +0100
From:	"Remy Bohmer" <linux@...mer.net>
To:	"Steven Rostedt" <rostedt@...dmis.org>
Cc:	"Ingo Molnar" <mingo@...e.hu>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	RT <linux-rt-users@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Hello Steven,

Thanks for your reply

> The above sounds more like you need a completion.
Funny, I first started with using completion structures, but that did
not work either. I get similar OOPses on all these kind of locking
mechanisms, as long as I use the _interruptible() type. I tried every
work-around I can think of, but none worked :-((
Even if I block on an ordinary rt-mutex in the same routine, wait a
_interruptible() type, I get the same problem.

> What's used to wake up the caller of down_interruptible?
A call to up() is used from inside an interrupt(thread) context, but
this is not relevant for the problem, because only blocking on a
semaphore with down_interruptible() and waking the thread by CTRL-C is
enough to get this Oops.

I saw that the code is trying to wake 'other waiters', while there is
only 1 thread waiting on the semaphore at most. I feel that the root
cause of this problem has to be searched there.

I believe that executing any PI code on semaphores is a strange
behavior anyway, because a semaphore is never 'owned' by a thread, and
it is always another thread that wakes the thread that blocks on a
semaphore, and because the waker is unknown, the PI code will always
boost the prio of the wrong thread.

Strange is also, that I get different behavior on ARM if I use
sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
crash less, it will not crash until the first up(); while the first
will crash even without any up().

Attached I have put a sample driver I just hacked together a few
minutes ago. It is NOT the driver that has generates the oops in the
previous mail, but I have stripped a scull-driver down that much that
it will be much easier to talk about, and to keep us focussed on the
part of the code that is causing this.
Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
also OOPSes although slightly different than on ARM. See the attached
dummy.txt file.

Beware: The up(&sema) is NOT required to get this OOPS, I get it even
without any up(&sema) !

I hope you can look at the attached driver source and help me with this...

Kind Regards,

Remy Bohmer

2007/11/16, Steven Rostedt <rostedt@...dmis.org>:
>
>
> On Fri, 16 Nov 2007, Remy Bohmer wrote:
>
> > Hello All,
> >
> > I have a problem with the RT-mutex code and signals. The problem is
> > very easily reproducible, but I do not have found the root-cause yet.
> > I hope someone can help me on this one.
> >
> > This is what I am doing:
> > * I have a simple character driver with a read call.(called spi_read()
> > in the logging below )
> > * The read call blocks on a semaphore until some condition in hardware
> > is reached. (in the routine wait_for_io_level(), see logging below)
> > * I use a down_interruptible() call on a 'struct semaphore' type
> > semaphore, which eventually blocks on a mutex. (the semaphore is, of
> > course, initialised with sema_init() )
>
> The above sounds more like you need a completion. What's used to wake up
> the caller of down_interruptible?
>
> Can you post some code to see what you are doing. That would make it much
> easier to analyze.
>
> -- Steve
>
>
> >
> > What happens is that when a user-space RT-thread is waiting on the
> > semaphore through the spi_read() call, and a signal arrives during the
> > wait at this thread (like e.g. CTRL-C), the kernel starts oopsing
> > until it is as good as brain-dead.
> >
> > If I do NOT sent a posix-signal the code/mutex/semaphore is working
> > properly for days,
> > So, it seems to be related by waking up from a blocking situation,
> > because of a pending posix-signal.
> >
> > I tried also the types 'struct compat_semaphore', and mutexes; none of
> > them work.
> > In fact the real Mutex type, declared with init_MUTEX() has the same problem.
> >
> > Anyone an idea?
> > Below the kernel oops output. (I run on ARM, but I think that should
> > not matter for this problem)
> >
>

Download attachment "dummy.tgz" of type "application/x-gzip" (1355 bytes)

View attachment "dummy.txt" of type "text/plain" (30848 bytes)