lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Dec 2021 09:19:17 +0200
From:   Amir Goldstein <amir73il@...il.com>
To:     Chengguang Xu <cgxu519@...ernel.net>
Cc:     Jan Kara <jack@...e.cz>, Miklos Szeredi <miklos@...redi.hu>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        overlayfs <linux-unionfs@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        ronyjin <ronyjin@...cent.com>,
        charliecgxu <charliecgxu@...cent.com>,
        Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation

On Wed, Dec 1, 2021 at 8:31 AM Chengguang Xu <cgxu519@...ernel.net> wrote:
>
>
>  ---- 在 星期三, 2021-12-01 10:37:15 Chengguang Xu <cgxu519@...ernel.net> 撰写 ----
>  >
>  >  ---- 在 星期三, 2021-12-01 03:04:59 Amir Goldstein <amir73il@...il.com> 撰写 ----
>  >  > >  > I was thinking about this a bit more and I don't think I buy this
>  >  > >  > explanation. What I rather think is happening is that real work for syncfs
>  >  > >  > (writeback_inodes_sb() and sync_inodes_sb() calls) gets offloaded to a flush
>  >  > >  > worker. E.g. writeback_inodes_sb() ends up calling
>  >  > >  > __writeback_inodes_sb_nr() which does:
>  >  > >  >
>  >  > >  > bdi_split_work_to_wbs()
>  >  > >  > wb_wait_for_completion()
>  >  > >  >
>  >  > >  > So you don't see the work done in the times accounted to your test
>  >  > >  > program. But in practice the flush worker is indeed burning 1.3s worth of
>  >  > >  > CPU to scan the 1 million inode list and do nothing.
>  >  > >  >
>  >  > >
>  >  > > That makes sense. However, in real container use case,  the upper dir is always empty,
>  >  > > so I don't think there is meaningful difference compare to accurately marking overlay
>  >  > > inode dirty.
>  >  > >
>  >  >
>  >  > It's true the that is a very common case, but...
>  >  >
>  >  > > I'm not very familiar with other use cases of overlayfs except container, should we consider
>  >  > > other use cases? Maybe we can also ignore the cpu burden because those use cases don't
>  >  > > have density deployment like container.
>  >  > >
>  >  >
>  >  > metacopy feature was developed for the use case of a container
>  >  > that chowns all the files in the lower image.
>  >  >
>  >  > In that case, which is now also quite common, all the overlay inodes are
>  >  > upper inodes.
>  >  >
>  >
>  > Regardless of metacopy or datacopy, that copy-up has already modified overlay inode
>  > so initialy marking dirty to all overlay inodes which have upper inode will not be a serious
>  > problem in this case too, right?
>  >
>  > I guess maybe you more concern about the re-mark dirtiness on above use case.
>  >
>  >
>  >
>  >  > What about only re-mark overlay inode dirty if upper inode is dirty or is
>  >  > writeably mmapped.
>  >  > For other cases, it is easy to know when overlay inode becomes dirty?
>  >  > Didn't you already try this?
>  >  >
>  >
>  > Yes, I've tried that approach in previous version but as Miklos pointed out in the
>  > feedback there are a few of racy conditions.
>  >

Right..

>
> So the final solution to handle all the concerns looks like accurately mark overlay inode
> diry on modification and re-mark dirty only for mmaped file in ->write_inode().
>
> Hi Miklos, Jan
>
> Will you agree with new proposal above?
>

Maybe you can still pull off a simpler version by remarking dirty only
writably mmapped upper AND inode_is_open_for_write(upper)?

If I am not mistaken, if you always mark overlay inode dirty on ovl_flush()
of FMODE_WRITE file, there is nothing that can make upper inode dirty
after last close (if upper is not mmaped), so one more inode sync should
be enough. No?

Thanks,
Amir.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ