lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191021234858.GA16251@castle>
Date:   Mon, 21 Oct 2019 23:49:04 +0000
From:   Roman Gushchin <guro@...com>
To:     Jan Kara <jack@...e.cz>
CC:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "tj@...nel.org" <tj@...nel.org>, Dennis Zhou <dennis@...nel.org>
Subject: Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying
 memory cgroups

On Wed, Oct 16, 2019 at 11:18:40AM +0200, Jan Kara wrote:
> On Tue 15-10-19 21:40:45, Roman Gushchin wrote:
> > On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote:
> > > On Thu 10-10-19 16:40:36, Roman Gushchin wrote:
> > > 
> > > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> > > >  	if (!list_empty(&inode->i_io_list)) {
> > > >  		struct inode *pos;
> > > >  
> > > > -		inode_io_list_del_locked(inode, old_wb);
> > > > +		inode_io_list_del_locked(inode, old_wb, false);
> > > >  		inode->i_wb = new_wb;
> > > >  		list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > > >  			if (time_after_eq(inode->dirtied_when,
> > > 
> > > This bit looks wrong. Not the change you made as such but the fact that you
> > > can now move inode from b_attached list of old wb to the dirty list of new
> > > wb.
> > 
> > Hm, can you, please, elaborate a bit more why it's wrong?
> > The reference to the old_wb will be dropped by the switching code.
> 
> My point is that the code in full looks like:
> 
>         if (!list_empty(&inode->i_io_list)) {
>                 struct inode *pos;
> 
>                 inode_io_list_del_locked(inode, old_wb);
>                 inode->i_wb = new_wb;
>                 list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
>                         if (time_after_eq(inode->dirtied_when,
>                                           pos->dirtied_when))
>                                 break;
>                 inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev);
>         } else {
> 
> So inode is always moved from some io list in old_wb to b_dirty list of
> new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists
> of old_wb. But once you add b_attached list to the game, it is not correct
> anymore. You should not add clean inode to b_dirty list of new_wb.

I see...

Hm, will checking of i_state for not containing I_DIRTY_ALL bits be enough here?
Alternatively, I can introduce a new bit which will explicitly point at the
inode being on the b_attached list, but I'd prefer not to do it.

> 
> > > > +
> > > > +	list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) {
> > > > +		if (!spin_trylock(&inode->i_lock))
> > > > +			continue;
> > > > +		xa_lock_irq(&inode->i_mapping->i_pages);
> > > > +		if (!(inode->i_state &
> > > > +		      (I_FREEING | I_CLEAR | I_SYNC | I_DIRTY | I_WB_SWITCH))) {
> > > > +			WARN_ON_ONCE(inode->i_wb != wb);
> > > > +			inode->i_wb = NULL;
> > > > +			wb_put(wb);
> > > 
> > > Hum, currently the code assumes that once i_wb is set, it never becomes
> > > NULL again. In particular the inode e.g. in
> > > fs/fs-writeback.c:inode_congested() or generally unlocked_inode_to_wb_begin()
> > > users could get broken by this. The i_wb switching code is so complex
> > > exactly because of these interactions.
> > > 
> > > Maybe you thought through the interactions and things are actually fine but
> > > if nothing else you'd need a big fat comment here explaining why this is
> > > fine and update inode_congested() comments etc.
> > 
> > Yeah, I thought that once inode is clean and not switching it's safe to clear
> > the i_wb pointer, but seems that it's not completely true.
> >
> > One idea I have is to always release wbs using rcu delayed work, so that
> > it will be save to dereference i_wb pointer under rcu, if only it's not NULL
> > (the check has to be added). I'll try to implement this scheme, but if you
> > know in advance that it's not gonna work, please, let me know.
> 
> I think I'd just drop inode_to_wb_is_valid() because once i_wb can change
> to NULL, that function is just pointless in that single callsite. Also we
> have to count with the fact that unlocked_inode_to_wb_begin() can return
> NULL and gracefully do as much as possible in that case for all the
> callers. And I agree that those occurences in mm/page-writeback.c should be
> blocked by inode being clean and you holding all those locks so you can
> warn if that happens I guess.

Yeah, it sounds good to me. I actually have a patch, which I'll post after
some more extensive testing (want to make sure I do not hit these warns).

Thank you!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ