linux-kernel - Re: 2.6.21-rc suspend regression: sysfs deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0703061735350.5963@woody.linux-foundation.org>
Date:	Tue, 6 Mar 2007 17:56:57 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hugh@...itas.com>
cc:	Oliver Neukum <oliver@...kum.name>,
	Maneesh Soni <maneesh@...ibm.com>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Adrian Bunk <bunk@...sta.de>, linux-kernel@...r.kernel.org
Subject: Re: 2.6.21-rc suspend regression: sysfs deadlock



On Tue, 6 Mar 2007, Hugh Dickins wrote:
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

Gaah. What a crock.

I really don't see any alternative to just reverting the whole change. 
Hugh's patch is simple, but rather pointless.

The fact is, the whole change is *bogus*.

We don't "lock" datastructures. We *reference count* them!

This is so fundamental that it's even mentioned in the file 
Documentation/CodingStyle in "Chapter 11: Data structures".

The whole "orphaned" kind of locking is broken. It's stupid. The way we do 
races between removal and use is that initial setup sets a reference count 
of 1, and something really simple like:

	static inline struct sysfs_buffer *get_sysfs_buffer(struct inode *inode)
	{
		struct sysfs_buffer *buffer = inode->i_private;

		BUG_ON(!mutex_locked(&inode->i_mutex));
		if (buffer)
			atomic_inc(&buffer->count);
		return buffer;
	}

	static inline void put_sysfs_buffer(struct sysfs_buffer *buffer)
	{
		if (atomic_dec_and_test(&buffer->count))
			kfree(buffer);
	}

and then the rule is:

 - everybody uses "get_sysfs_buffer()" to follow the reference (and yes, 
   you obviously have to hold "inode->i_mutex" for this to be safe! I 
   added the BUG_ON() as an example)

 - everybody uses "put_buffer()" to release it (and we simply don't *care* 
   whether somebody else released it too, since everybody has a reference 
   count)

 - removing the buffer is now just

	mutex_lock(&inode->i_mutex);
	buffer = inode->i_private;
	inode->i_private = NULL;
	mutex_unlock(&inode->i_mutex);

	put_sysfs_buffer(buffer);

 - everybody is happy!

Anyway, I'm unable to revert the broken commit, since there are now other 
changes that depend on it, but can somebody *please* do that? I'll apply 
Hugh's silly patch in the meantime, just to avoid the lockup.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/