linux-kernel - Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200109083641.GH4951@dhcp22.suse.cz>
Date:   Thu, 9 Jan 2020 09:36:41 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Wei Yang <richardw.yang@...ux.intel.com>
Cc:     hannes@...xchg.org, vdavydov.dev@...il.com,
        akpm@...ux-foundation.org, kirill.shutemov@...ux.intel.com,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, yang.shi@...ux.alibaba.com
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Thu 09-01-20 11:18:21, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
> >On Wed 08-01-20 08:35:43, Wei Yang wrote:
> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> >> >> As all the other places, we grab the lock before manipulate the defer list.
> >> >> >> Current implementation may face a race condition.
> >> >> >
> >> >> >Please always make sure to describe the effect of the change. Why a racy
> >> >> >list_empty check matters?
> >> >> >
> >> >> 
> >> >> Hmm... access the list without proper lock leads to many bad behaviors.
> >> >
> >> >My point is that the changelog should describe that bad behavior.
> >> >
> >> >> For example, if we grab the lock after checking list_empty, the page may
> >> >> already be removed from list in split_huge_page_list. And then list_del_init
> >> >> would trigger bug.
> >> >
> >> >And how does list_empty check under the lock guarantee that the page is
> >> >on the deferred list?
> >> 
> >> Just one confusion, is this kind of description basic concept of concurrent
> >> programming? How detail level we need to describe the effect?
> >
> >When I write changelogs for patches like this I usually describe, what
> >is the potential race - e.g.
> >	CPU1			CPU2
> >	path1			path2
> >	  check			  lock
> >	  			    operation2
> >				  unlock
> >	    lock
> >	    # check might not hold anymore
> >	    operation1
> >	    unlock
> >
> >and what is the effect of the race - e.g. a crash, data corruption,
> >pointless attempt for operation1 which fails with user visible effect
> >etc.
> 
> Hi, Michal, here is my attempt for an example. Hope this one looks good to
> you.
> 
> 
>     For example, the potential race would be:
>     
>         CPU1                      CPU2
>         mem_cgroup_move_account   split_huge_page_to_list
>           !list_empty
>                                     lock
>                                     !list_empty
>                                     list_del
>                                     unlock
>           lock
>           # !list_empty might not hold anymore
>           list_del_init
>           unlock
>     
>     When this sequence happens, the list_del_init() in
>     mem_cgroup_move_account() would crash since the page is already been
>     removed by list_del in split_huge_page_to_list().

Yes this looks much more informative. I would just add that this will
crash if CONFIG_DEBUG_LIST.

Thanks!
-- 
Michal Hocko
SUSE Labs