lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210408194849.wmueo74qcxghhf2d@dlxu-fedora-R90QNFJV>
Date:   Thu, 8 Apr 2021 12:48:49 -0700
From:   Daniel Xu <dxu@...uu.xyz>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     bpf@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kernel-team@...com, jolsa@...nel.org, hannes@...xchg.org,
        yhs@...com
Subject: Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

On Thu, Apr 08, 2021 at 07:14:01AM +0100, Matthew Wilcox wrote:
> On Wed, Apr 07, 2021 at 02:46:11PM -0700, Daniel Xu wrote:
> > +struct bpf_iter_seq_pagecache_info {
> > +	struct mnt_namespace *ns;
> > +	struct radix_tree_root superblocks;
> 
> Why are you adding a new radix tree?  Use an XArray instead.

Ah right, sorry. Will do.

> > +static struct page *goto_next_page(struct bpf_iter_seq_pagecache_info *info)
> > +{
> > +	struct page *page, *ret = NULL;
> > +	unsigned long idx;
> > +
> > +	rcu_read_lock();
> > +retry:
> > +	BUG_ON(!info->cur_inode);
> > +	ret = NULL;
> > +	xa_for_each_start(&info->cur_inode->i_data.i_pages, idx, page,
> > +			  info->cur_page_idx) {
> > +		if (!page_cache_get_speculative(page))
> > +			continue;
> 
> Why do you feel the need to poke around in i_pages directly?  Is there
> something wrong with find_get_entries()?

No reason other than I didn't know about the latter. Thanks for the
hint. find_get_entries() seems to return a pagevec of entries which
would complicate the iteration (a 4th layer of things to iterate over).

But I did find find_get_pages_range() which I think can be used to find
1 page at a time. I'll look into it further.

> > +static int __pagecache_seq_show(struct seq_file *seq, struct page *page,
> > +				bool in_stop)
> > +{
> > +	struct bpf_iter_meta meta;
> > +	struct bpf_iter__pagecache ctx;
> > +	struct bpf_prog *prog;
> > +
> > +	meta.seq = seq;
> > +	prog = bpf_iter_get_info(&meta, in_stop);
> > +	if (!prog)
> > +		return 0;
> > +
> > +	meta.seq = seq;
> > +	ctx.meta = &meta;
> > +	ctx.page = page;
> > +	return bpf_iter_run_prog(prog, &ctx);
> 
> I'm not really keen on the idea of random BPF programs being able to poke
> at pages in the page cache like this.  From your initial description,
> it sounded like all you needed was a list of which pages are present.

Could you elaborate on what "list of which pages are present" implies?
The overall goal with this patch is to detect duplicate content in the
page cache. So anything that helps achieve that goal I would (in theory)
be OK with.

My understanding is the user would need to hash the contents
of each page in the page cache. And BPF provides the flexibility such
that this work could be reused for currently unanticipated use cases.

Furthermore, bpf programs could already look at all the pages in the
page cache by hooking into tracepoint:filemap:mm_filemap_add_to_page_cache,
albeit at a much slower rate. I figure the downside of adding this
page cache iterator is we're explicitly condoning the behavior.

> > +	INIT_RADIX_TREE(&info->superblocks, GFP_KERNEL);
> > +
> > +	spin_lock(&info->ns->ns_lock);
> > +	list_for_each_entry(mnt, &info->ns->list, mnt_list) {
> > +		sb = mnt->mnt.mnt_sb;
> > +
> > +		/* The same mount may be mounted in multiple places */
> > +		if (radix_tree_lookup(&info->superblocks, (unsigned long)sb))
> > +			continue;
> > +
> > +		err = radix_tree_insert(&info->superblocks,
> > +				        (unsigned long)sb, (void *)1);
> > +		if (err)
> > +			goto out;
> > +	}
> > +
> > +	radix_tree_for_each_slot(slot, &info->superblocks, &iter, 0) {
> > +		sb = (struct super_block *)iter.index;
> > +		atomic_inc(&sb->s_active);
> > +	}
> 
> Uh.  What on earth made you think this was a good way to use the radix
> tree?  And, no, the XArray doesn't change that.

The idea behind the radix tree was to deduplicate the mounts by
superblock. Because a single filesystem may be mounted in different
locations. I didn't find a set data structure I could reuse so I
figured radix tree / xarray would work too.

Happy to take any better ideas too.

> If you don't understand why this is so bad, call xa_dump() on it after
> constructing it.  I'll wait.

I did a dump and got the following results: http://ix.io/2VpY .

I receieved a hint that you may be referring to how the xarray/radix
tree would be as large as the largest pointer. To my uneducated eye it
doesn't look like that's the case in this dump. Could you please
clarify?

<...>

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ