[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM=9tzbAMYqAHKURRneuLqAoE2=YScq7NNFmTjC1F9pHeWE6Q@mail.gmail.com>
Date: Thu, 19 Apr 2012 17:56:55 +0100
From: Dave Airlie <airlied@...il.com>
To: Jesse Barnes <jbarnes@...tuousgeek.org>
Cc: Andy Whitcroft <apw@...onical.com>,
David Airlie <airlied@...ux.ie>,
dri-devel@...ts.freedesktop.org,
Bryce Harrington <bryce@...onical.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] [RFC] DRM locking issues during early open
On Thu, Apr 19, 2012 at 5:55 PM, Jesse Barnes <jbarnes@...tuousgeek.org> wrote:
> On Thu, 19 Apr 2012 17:52:39 +0100
> Dave Airlie <airlied@...il.com> wrote:
>
>> On Thu, Apr 19, 2012 at 5:47 PM, Dave Airlie <airlied@...il.com> wrote:
>> > On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@...onical.com> wrote:
>> >> On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote:
>> >>> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@...onical.com> wrote:
>> >>> > We have been carrying a (rather poor) patch for an issue we identified in
>> >>> > the DRM driver. This issue is triggered when a DRM device is initialising
>> >>> > and userspace attempts to open it, typically in response to the sysfs
>> >>> > device added event. Basically we allocate the minor numbers making
>> >>> > the device available, and then call the drm load callback. Until this
>> >>> > completes the device is really not ready and these early opens typically
>> >>> > lead to oopses.
>> >>> >
>> >>> > We have been using the following patch to avoid this by marking the minors
>> >>> > as in error until the load method has completed. This avoids the early
>> >>> > open by simply erroring out the opens with EAGAIN. Obviously we should
>> >>> > be delaying the open until the load method complete.
>> >>> >
>> >>> > I include the existing patch for completness (it is not really ready for
>> >>> > merging) to illustrate the issue. I think it is logical that the wait
>> >>> > should simply be delayed until the load has completed. I am proposing
>> >>> > to include a wait queue associated with the idr cache for the drm minors
>> >>> > which we can use to allow open callers to wait_event_interruptible() on.
>> >>> > I'll be putting together a prototype shortly and will follow up with it.
>> >>> >
>> >>> > Thoughts?
>> >>>
>> >>> Couldn't we just delay registering things until the driver is ready to
>> >>> accept an open?
>> >>>
>> >>> Granted the midlayer of drm doesn't make that easy,
>> >>
>> >> It seems that we need the dri minor allocated before we hit the load
>> >> function as things are done right now.
>> >>
>> >>> thanks for sending this out, it keeps falling off my radar, I don't
>> >>> think I've ever seen this reported on RHEL/Fedora, which makes me
>> >>> wonder what we are doing that makes us lucky.
>> >>
>> >> We never hit it until we started doing things earlier and quicker. I first
>> >> found it in the prettification of boot so we were keen to get plymouth
>> >> running as soon as possible. That lead to random panics and me finding
>> >> this bug. The window is tiny as far as I know and it tends to be specific
>> >> machines and specific package combinations which trigger it reliably.
>> >>
>> >> I suspect that a proper fix would allow delaying the registration as you
>> >> suggest but in the interim a wait would at least avoid the issues we are
>> >> seeing. I will see how awful it looks.
>> >
>> > Just to confirm its the drm_sysfs_device_add that causes the race we care about.
>> >
>> > it needs to happen after the driver is happy. Since it calls
>> > device_register and that is what triggers udev magic to load the
>> > userspace.
>> >
>> > If you have a userspace app banging on a static device node that might
>> > need another set of fun fixes.
>>
>> Okay the sysfs add and the idr_replace are the things we need to delay.
>
> Since you can still get at things with a static node, it seems like
> locking is the real issue here? Is there no mutex we can take across
> init to block any openers until we're done?
well the idr replace should be the thing that matters, since before
that openers get -ENODEV, after it they end up success.
we may need a lock around that once we fix the logic.
Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists