lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180413065619.GD17484@dhcp22.suse.cz>
Date:   Fri, 13 Apr 2018 08:56:19 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     Cyrill Gorcunov <gorcunov@...il.com>, adobriyan@...il.com,
        willy@...radead.org, mguzik@...hat.com, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [v3 PATCH] mm: introduce arg_lock to protect arg_start|end and
 env_start|end in mm_struct

On Thu 12-04-18 09:20:24, Yang Shi wrote:
> 
> 
> On 4/12/18 5:18 AM, Michal Hocko wrote:
> > On Tue 10-04-18 11:28:13, Yang Shi wrote:
> > > 
> > > On 4/10/18 9:21 AM, Yang Shi wrote:
> > > > 
> > > > On 4/10/18 5:28 AM, Cyrill Gorcunov wrote:
> > > > > On Tue, Apr 10, 2018 at 01:10:01PM +0200, Michal Hocko wrote:
> > > > > > > Because do_brk does vma manipulations, for this reason it's
> > > > > > > running under down_write_killable(&mm->mmap_sem). Or you
> > > > > > > mean something else?
> > > > > > Yes, all we need the new lock for is to get a consistent view on brk
> > > > > > values. I am simply asking whether there is something fundamentally
> > > > > > wrong by doing the update inside the new lock while keeping the
> > > > > > original
> > > > > > mmap_sem locking in the brk path. That would allow us to drop the
> > > > > > mmap_sem lock in the proc path when looking at brk values.
> > > > > Michal gimme some time. I guess  we might do so, but I need some
> > > > > spare time to take more precise look into the code, hopefully today
> > > > > evening. Also I've a suspicion that we've wracked check_data_rlimit
> > > > > with this new lock in prctl. Need to verify it again.
> > > > I see you guys points. We might be able to move the drop of mmap_sem
> > > > before setting mm->brk in sys_brk since mmap_sem should be used to
> > > > protect vma manipulation only, then protect the value modify with the
> > > > new arg_lock. Then we can eliminate mmap_sem stuff in prctl path, and it
> > > > also prevents from wrecking check_data_rlimit.
> > > > 
> > > > At the first glance, it looks feasible to me. Will look into deeper
> > > > later.
> > > A further look told me this might be *not* feasible.
> > > 
> > > It looks the new lock will not break check_data_rlimit since in my patch
> > > both start_brk and brk is protected by mmap_sem. The code flow might look
> > > like below:
> > > 
> > > CPU A                             CPU B
> > > --------                       --------
> > > prctl                               sys_brk
> > >                                        down_write
> > > check_data_rlimit           check_data_rlimit (need mm->start_brk)
> > >                                        set brk
> > > down_write                    up_write
> > > set start_brk
> > > set brk
> > > up_write
> > > 
> > > 
> > > If CPU A gets the mmap_sem first, it will set start_brk and brk, then CPU B
> > > will check with the new start_brk. And, prctl doesn't care if sys_brk is run
> > > before it since it gets the new start_brk and brk from parameter.
> > > 
> > > If we protect start_brk and brk with the new lock, sys_brk might get old
> > > start_brk, then sys_brk might break rlimit check silently, is that right?
> > > 
> > > So, it looks using new lock in prctl and keeping mmap_sem in brk path has
> > > race condition.
> > OK, I've admittedly didn't give it too much time to think about. Maybe
> > we do something clever to remove the race but can we start at least by
> > reducing the write lock to read on prctl side and use the dedicated
> > spinlock for updating values? That should close the above race AFAICS
> > and the read lock would be much more friendly to other VM operations.
> 
> Yes, is sounds feasible. We just need care about prctl is run before
> sys_brk. 

There will never be any before/after ordering here. It has never been.
We just need the two to be mutually exlusive. We do not really need that
for races with the page fault because the prctl doesn't modify the
layout AFAIU.

> So, you mean:
> 
> down_read
> spin_lock
> update all the values
> spin_unlock
> up_read

Yes.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ