linux-kernel - Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5230D13C.301@tuebingen.mpg.de>
Date:	Wed, 11 Sep 2013 22:23:24 +0200
From:	Mario Kleiner <mario.kleiner@...bingen.mpg.de>
To:	Steven Rostedt <rostedt@...dmis.org>
CC:	Peter Hurley <peter@...leysoftware.com>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Clark Williams <williams@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Mario Kleiner <mario.kleiner@...bingen.mpg.de>
Subject: Re: BUG: sleeping function called from invalid context on 3.10.10-rt7



On 11.09.13 21:19, Steven Rostedt wrote:
> On Wed, 11 Sep 2013 21:07:10 +0200
> Mario Kleiner <mario.kleiner@...bingen.mpg.de> wrote:
>
>>
>>
>> On 11.09.13 20:35, Steven Rostedt wrote:
>>> On Wed, 11 Sep 2013 20:29:07 +0200
>>> Mario Kleiner <mario.kleiner@...bingen.mpg.de> wrote:
>>>
>>>> That said, maybe preempt_disable is no longer the optimal choice there
>>>> and there's some better way to achieve good protection against
>>>> interruptions of that bit of code? My knowledge here is a bit rusty, and
>>>> the intel kms drivers and rt stuff has changed quite a bit.
>>>
>>> If you set your code to a higher priority than other tasks (and
>>> interrupts) than it wont be preempted there. Unless of course it blocks
>>> on a lock, but even then, priority inheritance will take place and it
>>> still should be rather quick. (unless the holder of the lock is doing
>>> that strange polling).
>>>
>>> -- Steve
>>>
>>
>> Right, on a rt kernel. But that creates the problem of not very computer
>> savvy users (psychologists and biologists mostly) somehow having to
>> choose proper priorities for gpu interrupt threads and for the
>> x-server/wayland/..., and not much protection on a non-rt kernel?
>
> IIUC, the preempt_disable() is only for -rt, the non-rt case already
> disables preemption with the spin_locks called before it.
>

Oh, right! should have thought about that. I'm quite sleepy, so my brain 
is not working very well atm.

>>
>> preempt_disable() a few years ago looked like a good "plug and play"
>> default solution, because the ->get_crtc_scanoutpos() function was
>> supposed to have a very low and bounded execution time. At the time we
>> wrote the patches for intel/radeon/nouveau, that was the case. Typical
>> execution time (= preempt off time) was like 1-4 usecs, even on very low
>> end hardware.
>>
>> Seems that at least intel's kms driver does a lot of things now, which
>> can sleep and spin inside that section? I tried to follow the posted
>> stack trace, but got lost somewhere around the i915_read32 code and
>> power management stuff...
>
> Note, the sleeps only happen on -rt, and not in mainline.
>
> If one is going to use -rt for real-time work, it requires a bit more
> knowledge of the system. The problem with RT in general, is that it's
> hard, and anyone telling you they have a generic RT system that
> requires no computer savvyness can also be selling you a bridge over
> the east river.
>
> -- Steve

;) - I know the problem, i spend a lot of time telling that to users of 
my software, although they then generally want some sort of bridges 
anyway. I'm maintaining one of the most popular open-source toolkits for 
neuro-science, and in my experience at least the field of neuro-science 
research has the problem that a lot of people there need good real-time 
behaviour and a lot of flexibility in their hardware and software 
setups, but very few have the necessary technical background. Given the 
limited money they can spend, there's also not much commercial interest 
or probably viability in providing good technical consulting. The few 
proprietary hardware solutions i know of are either unaffordable by the 
majority, or are bridges over the east river, or quite often both. My 
main motivation for luring my users to Linux and contributing some 
little bits sometimes is the hope that some problems can be solved in a 
better way at the system level than piling software workarounds on top 
of hardware workarounds on top of expensive equipment.

But back to the topic, I think a better argument for the 
preempt_disable() there instead of changing code execution priority is 
that i wouldn't know how to set a static priority properly either. The 
timestamping code is also called from drm code (drmWaitVblank ioctl()) 
and it isn't called from the actual experiment software, where i would 
at least roughly know what i'm doing, and could adjust priorities 
dynamically, but from the X-Server, or maybe in the future Wayland, on 
behalf of the OpenGL client app. For the timestamping to work properly, 
one only would need a raised priority (higher than most interrupt kernel 
threads, except the one of the kms driver) for those few lines of 
timestamping code. I don't think it would be good to run xorg or wayland 
permanently at a higher priority than most irq threads, given that the 
display server does not only serve rt apps and is not designed as a 
realtime application. One only wants a short protection from preemption 
during timestamping.

Sorry, i think i'm rambling here quite a bit and i didn't want to 
sidetrack the thread, just give some explanation why i think the 
preempt_disable() is (/was?) justified.

-mario
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/