linux-kernel - Re: Driver crash on kernel 2.6.32.7. Interaction between cx8800 (DVB-S) and USB HVR Hauppauge 950q

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1266023937.3062.17.camel@palomino.walls.org>
Date:	Fri, 12 Feb 2010 20:18:57 -0500
From:	Andy Walls <awalls@...ix.net>
To:	Greg KH <gregkh@...e.de>
Cc:	Devin Heitmueller <dheitmueller@...nellabs.com>,
	Kay Sievers <kay.sievers@...y.org>,
	Richard Lemieux <rlemieu@...ptel.qc.ca>,
	linux-media@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Driver crash on kernel 2.6.32.7. Interaction between cx8800
 (DVB-S) and USB HVR Hauppauge 950q

On Thu, 2010-02-11 at 20:11 -0800, Greg KH wrote:
> On Wed, Feb 10, 2010 at 10:34:56AM -0500, Andy Walls wrote:
> > On Tue, 2010-02-09 at 22:12 -0500, Devin Heitmueller wrote:
> > > On Tue, Feb 9, 2010 at 10:05 PM, Richard Lemieux <rlemieu@...ptel.qc.ca> wrote:
> > > > Andy,
> > > >
> > > > This is a great answer!  Thanks very much.  When I get into this situation
> > > > again
> > > > I will know what to look for.
> > > >
> > > > A possible reason why I got into this problem in the first place is that I
> > > > tried
> > > > many combinations of parameters with mplayer and azap in order to learn how
> > > > to use the USB tuner in both the ATSC and the NTSC mode.  I will look back
> > > > in the terminal history to see if I can find anything.
> > > 
> > > I think the key to figuring out the bug at this point is you finding a
> > > sequence where you can reliably reproduce the oops.  If we have that,
> > > then I can start giving you some code to try which we can see if it
> > > addresses the problem.
> > > 
> > > For example, I would start by giving you a fix which results in us not
> > > calling the firmware release if the request_firmware() call failed,
> > > but it wouldn't be much help if you could not definitively tell me if
> > > the problem is fixed.
> > 
> > 
> > For the oops analysis here:
> > 
> > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/15954
> > 
> > 
> > I will also note that the file scope "fw_lock" mutex is rather
> > inconsistently used in
> > linux/drivers/base/fw_class.c:firmware_loading_store().  (I guess for
> > not wanting to consume the timeout interval with sleeping?)
> > 
> > The mutex protects "case 1:", but all other cases appear to be only
> > protected by atomic status bit checks that can fall through to
> > fw_load_abort() which complete()'s the fw_priv->completion.
> > 
> > Also not that in the _request_firmware() this sequence is the only place
> > a once good "fw_priv->fw" pointer is set to NULL:
> > 
> >         mutex_lock(&fw_lock);
> >         if (!fw_priv->fw->size || test_bit(FW_STATUS_ABORT, &fw_priv->status)) {
> >                 retval = -ENOENT;
> >                 release_firmware(fw_priv->fw);
> >                 *firmware_p = NULL;
> >         }
> >         fw_priv->fw = NULL;     <--------------- The only place it is set to NULL
> >         mutex_unlock(&fw_lock);
> > 
> > 
> > So if the timeout timer fires at nearly the same time as udev coming in
> > and say "I'm done loading" without holding the mutex, one can run into
> > the Ooops.  Not only that, I think the above code can leak memory under
> > some circumstances when the "if" clause is not satisfied.
> > 
> > I think this really is a firmware_class.c issue.  I think the "just
> > right" firmware loading timeouts and the particular computer system
> > responsiveness, make this Ooops possible.  However, I'm amazed that a
> > single person has tripped it more than once.
> > 
> > Revising the locking in linux/drivers/base/firmware_class.c should fix
> > the problem.
> > 
> > I don't believe this comment in the code now:
> > 
> > /* fw_lock could be moved to 'struct firmware_priv' but since it is just
> >  * guarding for corner cases a global lock should be OK */
> > static DEFINE_MUTEX(fw_lock);
> > 
> > struct firmware_priv {
> >         char *fw_id;
> > 	...
> > 
> > And since "f_priv" is dynamically created and destroyed by
> > request_firmware() I see no harm in 
> > 
> > 1. moving the mutex into struct firmware_priv
> > 2. just always just grabbing an almost never contended mutex
> > 3. getting rid of the file scope fw_lock.
> > 
> > except grabbing a mutex() while the timeout timer is running during
> > loading, means one *could* sleep for a while consuming the timeout
> > interval.
> 
> That sounds reasonable to me, care to make up a patch for this?

Yes.  But it will take me a while.  I don't have a git tree, because I
don't have high bandwidth internet yet.  (The cable company's been
delayed in laying cable to my home due to repeated snowstorms.)

I just didn't want the problem to fall through the cracks.  I'll submit
something to bugzilla for now.  If a user complains of this rare Ooops
when loading firmware, the current workaround is to lengthen the timeout
via sysfs.

Regards,
Andy

> thanks,
> 
> greg k-h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/