linux-kernel - Re: [BUG] fs/super: a possible sleep-in-atomic bug in put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sat, 7 Oct 2017 22:14:44 +0100
From:   Al Viro <viro@...IV.linux.org.uk>
To:     Vladimir Davydov <vdavydov.dev@...il.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Jia-Ju Bai <baijiaju1990@....com>, torbjorn.lindh@...ta.se,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG] fs/super: a possible sleep-in-atomic bug in put_super

On Sat, Oct 07, 2017 at 06:06:51PM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 02:56:40PM +0300, Vladimir Davydov wrote:
> > Hello,
> > 
> > On Fri, Oct 06, 2017 at 11:06:04AM +0200, Michal Hocko wrote:
> > > On Fri 06-10-17 16:59:18, Jia-Ju Bai wrote:
> > > > According to fs/super.c, the kernel may sleep under a spinlock.
> > > > The function call path is:
> > > > put_super (acquire the spinlock)
> > > >   __put_super
> > > >     destroy_super
> > > >       list_lru_destroy
> > > >         list_lru_unregister
> > > >           mutex_lock --> may sleep
> > > >         memcg_get_cache_ids
> > > >           down_read --> may sleep
> > > > 
> > > > This bug is found by my static analysis tool and my code review.
> > 
> > This is false-positive: by the time we get to destroy_super(), the lru
> > lists have already been destroyed - see deactivate_locked_super() - so
> > list_lru_destroy() will retrun right away without attempting to take any
> > locks. That's why there's no lockdep warnings regarding this issue.
> > 
> > I think we can move list_lru_destroy() to destroy_super_work() to
> > suppress this warning. Not sure if it's really worth the trouble though.
> 
> It's a bit trickier than that (callers of destroy_super() prior to superblock
> getting reachable via shared data structures do not have that lru_list_destroy()
> a no-op, but they are not called under spinlocks).
> 
> Locking in mm/list_lru.c looks excessive, but then I'm not well familiar with
> that code.

It *is* excessive.

	1) coallocate struct list_lru and array of struct list_lru_node
hanging off it.  Turn all existing variables and struct members of that
type into pointers.  init would allocate and return a pointer, destroy
would free (and leave it for callers to clear their pointers, of course).

	2) store the value of memcg_nr_cache_ids as of the last creation
or resize in list_lru.  Pass that through to memcg_destroy_list_lru_node().
Result: no need for memcg_get_cache_ids() in list_lru_destroy().

	3) add static list_lru *currently_resized, protected by list_lru_mutex.
NULL when memcg_update_all_list_lrus() is not run, points to currently
resized list_lru when it is.

	4) have lru_list_destroy() check (under list_lru_mutex) whether it's
being asked to kill the currently resized one.  If it is, do
	victim->list.prev->next = victim->list.next;
	victim->list.next->prev = victim->list.prev;
	victim->list.prev = NULL;
and bugger off, otherwise act as now.  Turn the loop in
memcg_update_all_list_lrus() into
	mutex_lock(&list_lrus_mutex);
	lru = list_lrus.next;
	while (lru != &list_lrus) {
		currently_resized = list_entry(lru, struct list_lru, list);
		mutex_unlock(&list_lrus_mutex);
		ret = memcg_update_list_lru(lru, old_size, new_size);
		mutex_lock(&list_lrus_mutex);
		if (unlikely(!lru->prev)) {
			lru = lru->next;
			free currently_resized as list_lru_destroy() would have
			continue;
		}
		if (ret)
			goto fail;
		lru = lru->next;
	}
	currently_resized = NULL;
	mutex_unlock(&list_lrus_mutex);
	
	5) replace list_lrus_mutex with a spinlock.

At that point list_lru_destroy() becomes non-blocking.  I think it should work,
but that's from several hours of looking through that code, so I might be
missing something subtle here...