linux-kernel - Re: [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ec27c801-db0a-4e57-b0bd-03a695e70e56@huawei.com>
Date: Wed, 21 Aug 2024 20:30:51 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Subbaraya Sundeep
	<sbhatta@...vell.com>, Chuck Lever <chuck.lever@...cle.com>, Sagi Grimberg
	<sagi@...mberg.me>, Jeroen de Borst <jeroendb@...gle.com>, Praveen
 Kaligineedi <pkaligineedi@...gle.com>, Shailend Chand <shailend@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>, Sunil Goutham
	<sgoutham@...vell.com>, Geetha sowjanya <gakula@...vell.com>, hariprasad
	<hkelam@...vell.com>, Felix Fietkau <nbd@....name>, Sean Wang
	<sean.wang@...iatek.com>, Mark Lee <Mark-MC.Lee@...iatek.com>, Lorenzo
 Bianconi <lorenzo@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, Keith
 Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>, Christoph Hellwig
	<hch@....de>, Chaitanya Kulkarni <kch@...dia.com>, "Michael S. Tsirkin"
	<mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
	Eugenio Pérez <eperezma@...hat.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Alexei Starovoitov <ast@...nel.org>, Daniel
 Borkmann <daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>,
	John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko
	<andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Eduard
 Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, Yonghong Song
	<yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, Stanislav Fomichev
	<sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
	David Howells <dhowells@...hat.com>, Marc Dionne <marc.dionne@...istor.com>,
	Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>, Olga
 Kornievskaia <kolga@...app.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey
	<tom@...pey.com>, Trond Myklebust <trondmy@...nel.org>, Anna Schumaker
	<anna@...nel.org>, Shuah Khan <shuah@...nel.org>,
	<intel-wired-lan@...ts.osuosl.org>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-mediatek@...ts.infradead.org>, <linux-nvme@...ts.infradead.org>,
	<kvm@...r.kernel.org>, <virtualization@...ts.linux.dev>,
	<linux-mm@...ck.org>, <bpf@...r.kernel.org>, <linux-afs@...ts.infradead.org>,
	<linux-nfs@...r.kernel.org>, <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to
 page_frag API

On 2024/8/21 0:02, Alexander Duyck wrote:
> On Tue, Aug 20, 2024 at 6:07 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> On 2024/8/19 23:54, Alexander Duyck wrote:
>>
>> ...
>>
>>>>>>
>>>>>> "There are three types of API as proposed in this patchset instead of
>>>>>> two types of API:
>>>>>> 1. page_frag_alloc_va() returns [va].
>>>>>> 2. page_frag_alloc_pg() returns [page, offset].
>>>>>> 3. page_frag_alloc() returns [va] & [page, offset].
>>>>>>
>>>>>> You seemed to miss that we need a third naming for the type 3 API.
>>>>>> Do you see type 3 API as a valid API? if yes, what naming are you
>>>>>> suggesting for it? if no, why it is not a valid API?"
>>>>>
>>>>> I didn't. I just don't see the point in pushing out the existing API
>>>>> to support that. In reality 2 and 3 are redundant. You probably only
>>>>> need 3. Like I mentioned earlier you can essentially just pass a
>>>>
>>>> If the caller just expect [page, offset], do you expect the caller also
>>>> type 3 API, which return both [va] and [page, offset]?
>>>>
>>>> I am not sure if I understand why you think 2 and 3 are redundant here?
>>>> If you think 2 and 3 are redundant here, aren't 1 and 3 also redundant
>>>> as the similar agrument?
>>>
>>> The big difference is the need to return page and offset. Basically to
>>> support returning page and offset you need to pass at least one value
>>> as a pointer so you can store the return there.
>>>
>>> The reason why 3 is just a redundant form of 2 is that you will
>>> normally just be converting from a va to a page and offset so the va
>>> should already be easily accessible.
>>
>> I am assuming that by 'easily accessible', you meant the 'va' can be
>> calculated as below, right?
>>
>> va = encoded_page_address(encoded_va) +
>>                 (page_frag_cache_page_size(encoded_va) - remaining);
>>
>> I guess it is easily accessible, but it is not without some overhead
>> to calculate the 'va' here.
> 
> It is just the encoded_page_address + offset that you have to
> calculate anyway. So the only bit you actually have to do is 2
> instructions, one to mask the encoded_va and then the addition of the
> offset that you provided to the page. As it stands those instruction
> can easily be slipped in while you are working on converting the va to
> a page.

Well, with your suggestions against other optimizations like avoiding
a checking in fast patch and avoid calling virt_to_page(), the overhead
is kind of added up.

And I am really surprised by your above suggestion about deciding the
API for users according to the internal implementation detail here. As
the overhead of calculating 'va' is really depending on the layout of
'struct page_frag_cache' here, what if we change the implementation and
the overhead of calculating 'va' becomes bigger? Do we expect to change
the API for the callers when we change the internal implementation of
page_frag_cache?

> 
> 
>>>
>>>>> page_frag via pointer to the function. With that you could also look
>>>>> at just returning a virtual address as well if you insist on having
>>>>> something that returns all of the above. No point in having 2 and 3 be
>>>>> seperate functions.
>>>>
>>>> Let's be more specific about what are your suggestion here: which way
>>>> is the prefer way to return the virtual address. It seems there are two
>>>> options:
>>>>
>>>> 1. Return the virtual address by function returning as below:
>>>> void *page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio);
>>>>
>>>> 2. Return the virtual address by double pointer as below:
>>>> int page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio,
>>>>                         void **va);
>>>
>>> I was thinking more of option 1. Basically this is a superset of
>>> page_frag_alloc_va that is also returning the page and offset via a
>>> page frag. However instead of bio_vec I would be good with "struct
>>> page_frag *" being the value passed to the function to play the role
>>> of container. Basically the big difference between 1 and 2/3 if I am
>>> not mistaken is the fact that for 1 you pass the size, whereas with
>>> 2/3 you are peeling off the page frag from the larger page frag cache
>>
>> Let's be clear here: The callers just expecting [page, offset] also need
>> to call type 3 API, which return both [va] and [page, offset]? and it
>> is ok to ignore the overhead of calculating the 'va' for those kinds
>> of callers just because we don't want to do the renaming for a existing
>> API and can't come up with good naming for that?
>>
>>> after the fact via a commit type action.
>>
>> Just be clear here, there is no commit type action for some subtype of
>> type 2/3 API.
>>
>> For example, for type 2 API in this patchset, it has below subtypes:
>>
>> subtype 1: it does not need a commit type action, it just return
>>            [page, offset] instead of page_frag_alloc_va() returning [va],
>>            and it does not return the allocated fragsz back to the caller
>>            as page_frag_alloc_va() does not too:
>> struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
>>                                 unsigned int *offset, unsigned int fragsz,
>>                                 gfp_t gfp)
>>
>> subtype 2: it does need a commit type action, and @fragsz is returned to
>>            the caller and caller used that to commit how much fragsz to
>>            commit.
>> struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc,
>>                                         unsigned int *offset,
>>                                         unsigned int *fragsz, gfp_t gfp)
>>
>> Do you see subtype 1 as valid API? If no, why?
> 
> Not really, it is just a wrapper for page_frag_alloc that is
> converting the virtual address to a page and offset. They are the same
> data and don't justify the need for two functions. It kind of explains

I am supposing you meant something like below:
struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
				unsigned int *offset, unsigned int fragsz,
				gfp_t gfp)
{
	struct page *page;
	void *va;

	va = page_frag_alloc_va(nc, fragsz, gfp);
	if (!va)
		return NULL;

	page = virt_to_head_page(va);
	*offset = va - page_to_virt(page);

	return page;
}

If yes, I really think you are caring about maintainability too much by
trading off too much performance here by not only recalculating the offset
here, but also sometimes calling virt_to_head_page() unnecessarily.

If no, please share the pseudo code in your mind.

> one of the complaints I had about this code. Supposedly it was
> refactoring and combining several different callers into one, but what
> it is actually doing is fracturing the code path into 3 different
> variants based on little if any actual difference as it is doing
> unnecessary optimization.

I am supposing the 3 different variants meant the below, right?
1. page_frag_alloc_va() returns [va].
2. page_frag_alloc_pg() returns [page, offset].
3. page_frag_alloc() returns [va] & [page, offset].

And there is others 3 different variants for prepare API too:
4. page_frag_alloc_va_prepare() returns [va].
5. page_frag_alloc_pg_prepare() returns [page, offset].
6. page_frag_alloc_prepare() returns [va] & [page, offset].

Side note: I just found the '4. page_frag_alloc_va_prepare()' API is
not used/called currently and can be removed in next revision for this
patchset.

It seems what you really want is 3 & 2 to be a wrapper for 1, and
5 & 6 to be a wrapper for 4?

If yes, too much performance is traded off here as my understanding.
Does't the introducing of __page_frag_cache_reload() already enable the
balance between performance and maintainability as much as possible in
patch 8?

> 
>> If yes, do you also expect the caller to use "struct page_frag *" as the
>> container? If yes, what is the caller expected to do with the size field in
>> "struct page_frag *" from API perspective? Just ignore it?
> 
> It should be populated. You passed a fragsz, so you should populate
> the output fragsz so you can get the truesize in the case of network
> packets. The removal of the page_frag from the other callers is making
> it much harder to review your code anyway. If we keep the page_frag
> there it should reduce the amount of change needed when you replace
> page_frag with the page_frag_cache.

I am not starting to use page_frag as the container yet, but the above
part is something that I am probably agreed with.

> 
> Honestly this is eating up too much of my time. As I said before this
> patch set is too big and it is trying to squeeze in more than it
> really should for a single patch set to be reviewable. Going forward
> please split up the patch set as I had suggested before and address my
> comments. Ideally you would have your first patch just be some
> refactor and cleanup to get the "offset" pointer moving in the
> direction you want. With that we can at least get half of this set
> digested before we start chewing into all this refactor for the
> replacement of page_frag with the page_frag_cache.

I don't really think breaking this patchset into more patchsets without
a newcase is helping to speed up the process here, it might slow down
the process instead, as the different idea about the refactoring and
new API naming is not going to disappear by breaking the patchset, and
the breaking may make the discussion harder without a bigger picture
and context.