lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161216090157.GA13940@dhcp22.suse.cz>
Date:   Fri, 16 Dec 2016 10:01:57 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Vegard Nossum <vegard.nossum@...cle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Rik van Riel <riel@...hat.com>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting
 debugging

On Fri 16-12-16 09:22:02, Vegard Nossum wrote:
> Reference counting bugs are hard to debug by their nature since the actual
> manifestation of one can occur very far from where the error is introduced
> (e.g. a missing get() only manifest as a use-after-free when the reference
> count prematurely drops to 0, which could be arbitrarily long after where
> the get() should have happened if there are other users). I wrote this patch
> to try to track down a suspected 'mm_struct' reference counting bug.

I definitely agree that hunting these bugs is a royal PITA, no question
about that. I am just wondering whether this has been motivated by any
particular bug recently. I do not seem to remember any such an issue for
quite some time.

> The basic idea is to keep track of all references, not just with a reference
> counter, but with an actual reference _list_. Whenever you get() or put() a
> reference, you also add or remove yourself, respectively, from the reference
> list. This really helps debugging because (for example) you always put a
> specific reference, meaning that if that reference was not yours to put, you
> will notice it immediately (rather than when the reference counter goes to 0
> and you still have an active reference).

But who is the owner of the reference? A function/task? It is not all
that uncommon to take an mm reference from one context and release it
from a different one. But I might be missing your point here.

> The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while
> the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both
> ->mm_users and ->mm_count, we introduce helpers for both of them, but use
> the same data structure for each (struct mm_ref). The low-level rules (i.e.
> the ones we have to follow, but which nobody else should really have to
> care about since they use the higher-level interface) are:
> 
>  - after incrementing ->mm_count you also have to call get_mm_ref()
> 
>  - before decrementing ->mm_count you also have to call put_mm_ref()
> 
>  - after incrementing ->mm_users you also have to call get_mm_users_ref()
> 
>  - before decrementing ->mm_users you also have to call put_mm_users_ref()
> 
> The rules that most of the rest of the kernel will care about are:
> 
>  - functions that acquire and return a mm_struct should take a
>    'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc.
> 
>  - functions that release an mm_struct passed as a parameter should also
>    take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc.
> 
>  - any function that temporarily acquires a mm_struct reference should
>    use MM_REF() to define an on-stack reference and pass it on to
>    mmget()/mmput()/mmgrab()/mmdrop()/etc.
> 
>  - any structure that holds an mm_struct pointer must also include a
>    'struct mm_ref' member; when the mm_struct pointer is modified you
>    would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they
>    should be called with this mm_ref
> 
>  - you can convert (for example) an on-stack reference to an in-struct
>    reference using move_mm_ref(). This is semantically equivalent to
>    (atomically) taking the new reference and dropping the old one, but
>    doesn't actually need to modify the reference count

This all sounds way too intrusive to me so I am not really sure this is
something we really want. A nice thing for debugging for sure but I am
somehow skeptical whether it is really worth it considering how many
those ref. count bugs we've had.

[...]
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ