linux-kernel - Re: [PATCH v2] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231013094817.bm62tq3cjjtgobto@quack3>
Date:   Fri, 13 Oct 2023 11:48:17 +0200
From:   Jan Kara <jack@...e.cz>
To:     Jingbo Xu <jefflexu@...ux.alibaba.com>
Cc:     tj@...nel.org, guro@...com, lizefan.x@...edance.com,
        hannes@...xchg.org, cgroups@...r.kernel.org, jack@...e.cz,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        viro@...iv.linux.org.uk, brauner@...nel.org, willy@...radead.org,
        joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH v2] writeback, cgroup: switch inodes with dirty
 timestamps to release dying cgwbs

On Fri 13-10-23 13:52:08, Jingbo Xu wrote:
> The cgwb cleanup routine will try to release the dying cgwb by switching
> the attached inodes.  It fetches the attached inodes from wb->b_attached
> list, omitting the fact that inodes only with dirty timestamps reside in
> wb->b_dirty_time list, which is the case when lazytime is enabled.  This
> causes enormous zombie memory cgroup when lazytime is enabled, as inodes
> with dirty timestamps can not be switched to a live cgwb for a long time.
> 
> It is reasonable not to switch cgwb for inodes with dirty data, as
> otherwise it may break the bandwidth restrictions.  However since the
> writeback of inode metadata is not accounted for, let's also switch
> inodes with dirty timestamps to avoid zombie memory and block cgroups
> when laztytime is enabled.
> 
> Fixs: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
  ^^^ Fixes

> Signed-off-by: Jingbo Xu <jefflexu@...ux.alibaba.com>

Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

								Honza

> ---
> v2: add comment explaining why switching for inodes with dirty
> timestamps is needed
> 
> v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@linux.alibaba.com/
> ---
>  fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------
>  1 file changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index c1af01b2c42d..1767493dffda 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
>  	kfree(isw);
>  }
>  
> +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
> +				   struct list_head *list, int *nr)
> +{
> +	struct inode *inode;
> +
> +	list_for_each_entry(inode, list, i_io_list) {
> +		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> +			continue;
> +
> +		isw->inodes[*nr] = inode;
> +		(*nr)++;
> +
> +		if (*nr >= WB_MAX_INODES_PER_ISW - 1)
> +			return true;
> +	}
> +	return false;
> +}
> +
>  /**
>   * cleanup_offline_cgwb - detach associated inodes
>   * @wb: target wb
> @@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
>  {
>  	struct cgroup_subsys_state *memcg_css;
>  	struct inode_switch_wbs_context *isw;
> -	struct inode *inode;
>  	int nr;
>  	bool restart = false;
>  
> @@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
>  
>  	nr = 0;
>  	spin_lock(&wb->list_lock);
> -	list_for_each_entry(inode, &wb->b_attached, i_io_list) {
> -		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> -			continue;
> -
> -		isw->inodes[nr++] = inode;
> -
> -		if (nr >= WB_MAX_INODES_PER_ISW - 1) {
> -			restart = true;
> -			break;
> -		}
> -	}
> +	/*
> +	 * In addition to the inodes that have completed writeback, also switch
> +	 * cgwbs for those inodes only with dirty timestamps. Otherwise, those
> +	 * inodes won't be written back for a long time when lazytime is
> +	 * enabled, and thus pinning the dying cgwbs. It won't break the
> +	 * bandwidth restrictions, as writeback of inode metadata is not
> +	 * accounted for.
> +	 */
> +	restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
> +	if (!restart)
> +		restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
>  	spin_unlock(&wb->list_lock);
>  
>  	/* no attached inodes? bail out */
> -- 
> 2.19.1.6.gb485710b
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR