linux-kernel - Re: [PATCH v2] lib/scatterlist: add simple page iterator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1360688840.5578.75.camel@localhost>
Date:	Tue, 12 Feb 2013 19:07:20 +0200
From:	Imre Deak <imre.deak@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org,
	Maxim Levitsky <maximlevitsky@...il.com>,
	Tejun Heo <tj@...nel.org>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	linaro-mm-sig@...ts.linaro.org
Subject: Re: [PATCH v2] lib/scatterlist: add simple page iterator

On Mon, 2013-02-11 at 12:54 -0800, Andrew Morton wrote:
> On Mon, 11 Feb 2013 20:50:04 +0200
> Imre Deak <imre.deak@...el.com> wrote:
> 
> > Add an iterator to walk through a scatter list a page at a time starting
> > at a specific page offset. As opposed to the mapping iterator this is
> 
> What is "the mapping iterator"?

It's the one implemented by sg_miter_{start,stop} in scatterlist.c. It
also iterates through a scatterlist a page at a time, but it also kmaps
these pages. Since in our use case we don't need to map the pages we
needed a solution without this overhead.

> > meant to be small, performing well even in simple loops like collecting
> > all pages on the scatterlist into an array or setting up an iommu table
> > based on the pages' DMA address.
> 
> Where will this new macro be used?  What is driving this effort?

At the moment the only user of the macro would be the i915 driver, see
[1] for the patches that takes it into use. In the patchset the macro
was added as a DRM specific macro, but since it might be useful in the
future for other drivers too (anything using dma-buf) I'd like to add it
to a more generic place.

> > v2:
> > - In each iteration sg_pgoffset pointed incorrectly at the next page not
> >   the current one.
> > 
> > Signed-off-by: Imre Deak <imre.deak@...el.com>
> > ---
> >  include/linux/scatterlist.h |   50 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 50 insertions(+)
> > 
> > diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> > index 4bd6c06..72578b5 100644
> > --- a/include/linux/scatterlist.h
> > +++ b/include/linux/scatterlist.h
> > @@ -231,6 +231,56 @@ size_t sg_copy_to_buffer(struct scatterlist *sgl, unsigned int nents,
> >   */
> >  #define SG_MAX_SINGLE_ALLOC		(PAGE_SIZE / sizeof(struct scatterlist))
> >  
> > +struct sg_page_iter {
> > +	struct scatterlist *sg;
> > +	int sg_pgoffset;
> > +	struct page *page;
> > +};
> 
> Some documentation wouldn't hurt.   What it's used for, why it exists.

Ok, will add it.

> 
> > +static inline int
> > +sg_page_cnt(struct scatterlist *sg)
> 
> unneeded newline here.
> 
> A more typical name would be "sg_page_count".  Stripping words of their
> vowels makes the symbols harder to remember.

Ok, will fix this.

> > +{
> > +	BUG_ON(sg->offset || sg->length & ~PAGE_MASK);
> > +
> > +	return sg->length >> PAGE_SHIFT;
> > +}
> > +
> > +static inline struct page *
> > +sg_page_iter_get_page(struct sg_page_iter *iter)
> > +{
> > +	while (iter->sg && iter->sg_pgoffset >= sg_page_cnt(iter->sg)) {
> > +		iter->sg_pgoffset -= sg_page_cnt(iter->sg);
> > +		iter->sg = sg_next(iter->sg);
> > +	}
> > +
> > +	return iter->sg ? nth_page(sg_page(iter->sg), iter->sg_pgoffset) : NULL;
> > +}
> > +
> > +static inline void
> > +sg_page_iter_next(struct sg_page_iter *iter)
> > +{
> > +	iter->sg_pgoffset++;
> > +	iter->page = sg_page_iter_get_page(iter);
> > +}
> > +
> > +static inline void
> > +sg_page_iter_start(struct sg_page_iter *iter, struct scatterlist *sglist,
> > +		   unsigned long pgoffset)
> > +{
> > +	iter->sg = sglist;
> > +	iter->sg_pgoffset = pgoffset;
> > +	iter->page = sg_page_iter_get_page(iter);
> > +}
> 
> All the above are undocumented also.  I guess that's acceptable if they
> are only ever to be used by for_each_sg_page().  Although if that's the
> case then perhaps the identifiers should be a bit more obscure-looking.
> Usually we prefix them with "__" to say "this is in internal thing".

Yes, they are meant to be used only internally, so I'll add the __
prefix.

> > +/*
> > + * Simple sg page iterator, starting off at the given page offset. Each entry
> > + * on the sglist must start at offset 0 and can contain only full pages.
> > + * iter->page will point to the current page, iter->sg_pgoffset to the page
> > + * offset within the sg holding that page.
> > + */
> > +#define for_each_sg_page(sglist, iter, pgoffset)				\
> > +	for (sg_page_iter_start((iter), (sglist), (pgoffset));		\
> > +	     (iter)->page; sg_page_iter_next(iter))
> 
> Because all the helper functions are inlined, this will expand to a
> quite large amount of code.  And large code can be slow code due to
> I-cache eviction.
> 
> I don't know *how* big this thing will be because the patch didn't
> include a caller and I can't be bothered writing my own.  (And the lack
> of any caller means that the code will not be tested).
> 
> So, exactly how big is this thing, and how do we know it's better this
> way than if we were to uninline some/all of the helpers?

I admit I only hoped compiler optimization would keep the inlined parts
at a minimum, but now I actually checked (on Intel CPU). I applied the
patchset from [1] and uninlined sg_page_iter_start as it's not
significant for speed:

 size drivers/gpu/drm/i915/i915.ko
 514855	  15996	    272	 531123	  81ab3	drivers/gpu/drm/i915/i915.ko

Then uninlined all helpers:
 size drivers/gpu/drm/i915/i915.ko
 513447	  15996	    272	 529715	  81533	drivers/gpu/drm/i915/i915.ko

Since there are 8 invocations of the macro, the overhead for a single
invocation is about (531123 - 529715) / 8 = 191 bytes.

For speed, I benchmarked a simple loop which was basically:

page = vmalloc(sizeof(*page) * 1000, GFP_KERNEL);
for_each_sg_page(sglist, iter, 0)
	*page++ = iter.page;

where each entry on the sglist contained 16 consecutive pages. This
takes ~10% more time for the uninlined version to run. This is a rather
artificial test and I couldn't come up with something more real-life
using only the i915 driver's ioctl interface that would show a
significant change in speed.

So at least for now I'm ok with just uninlining all the helpers.

Thanks for the review,
Imre

[1]
http://lists.freedesktop.org/archives/intel-gfx/2013-February/024589.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/