linux-kernel - Re: [BUG] fs/super: a possible sleep-in-atomic bug in put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171008005602.GT21978@ZenIV.linux.org.uk>
Date:   Sun, 8 Oct 2017 01:56:08 +0100
From:   Al Viro <viro@...IV.linux.org.uk>
To:     Vladimir Davydov <vdavydov.dev@...il.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Jia-Ju Bai <baijiaju1990@....com>, torbjorn.lindh@...ta.se,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG] fs/super: a possible sleep-in-atomic bug in put_super

On Sat, Oct 07, 2017 at 10:14:44PM +0100, Al Viro wrote:

> 	1) coallocate struct list_lru and array of struct list_lru_node
> hanging off it.  Turn all existing variables and struct members of that
> type into pointers.  init would allocate and return a pointer, destroy
> would free (and leave it for callers to clear their pointers, of course).

Better yet, keep list_lru containing just the pointer to list_lru_node
array.  And put that array into the tail of struct list_lru_nodes.  That
way normal accesses are kept exactly as-is and we don't need to update
the users of that thing at all.

> 	4) have lru_list_destroy() check (under list_lru_mutex) whether it's
> being asked to kill the currently resized one.  If it is, do
> 	victim->list.prev->next = victim->list.next;
> 	victim->list.next->prev = victim->list.prev;
> 	victim->list.prev = NULL;

Doesn't work, unfortunately - it needs to stay on the list and be marked
in some other way.

> and bugger off, otherwise act as now.  Turn the loop in
> memcg_update_all_list_lrus() into
> 	mutex_lock(&list_lrus_mutex);
> 	lru = list_lrus.next;
> 	while (lru != &list_lrus) {
> 		currently_resized = list_entry(lru, struct list_lru, list);
> 		mutex_unlock(&list_lrus_mutex);
> 		ret = memcg_update_list_lru(lru, old_size, new_size);
> 		mutex_lock(&list_lrus_mutex);
> 		if (unlikely(!lru->prev)) {
> 			lru = lru->next;

... because this might very well be pointing to already freed object.

> 			free currently_resized as list_lru_destroy() would have
> 			continue;

What's more, we need to be careful about resize vs. drain.  Right now it's
on list_lrus_mutex, but if we drop that around actual resize of an individual
list_lru, we'll need something else.  Would there be any problem if we
took memcg_cache_ids_sem shared in memcg_offline_kmem()?

The first problem is not fatal - we can e.g. use the sign of the field used
to store the number of ->memcg_lrus elements (i.e. stashed value of
memcg_nr_cache_ids at allocation or last resize) to indicate that actual
freeing is left for resizer...