[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uc+e3MUb4CK1i7H7F=y-fHTxiGF8zddBFiqFRdbd6ofLg@mail.gmail.com>
Date: Tue, 20 Aug 2024 09:02:57 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Subbaraya Sundeep <sbhatta@...vell.com>, Chuck Lever <chuck.lever@...cle.com>,
Sagi Grimberg <sagi@...mberg.me>, Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>, Shailend Chand <shailend@...gle.com>,
Eric Dumazet <edumazet@...gle.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>, Sunil Goutham <sgoutham@...vell.com>,
Geetha sowjanya <gakula@...vell.com>, hariprasad <hkelam@...vell.com>, Felix Fietkau <nbd@....name>,
Sean Wang <sean.wang@...iatek.com>, Mark Lee <Mark-MC.Lee@...iatek.com>,
Lorenzo Bianconi <lorenzo@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, Keith Busch <kbusch@...nel.org>,
Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>, Chaitanya Kulkarni <kch@...dia.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Eugenio Pérez <eperezma@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
David Howells <dhowells@...hat.com>, Marc Dionne <marc.dionne@...istor.com>,
Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>, Olga Kornievskaia <kolga@...app.com>,
Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>, Shuah Khan <shuah@...nel.org>,
intel-wired-lan@...ts.osuosl.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, linux-nvme@...ts.infradead.org,
kvm@...r.kernel.org, virtualization@...ts.linux.dev, linux-mm@...ck.org,
bpf@...r.kernel.org, linux-afs@...ts.infradead.org, linux-nfs@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net-next v13 04/14] mm: page_frag: add '_va' suffix to
page_frag API
On Tue, Aug 20, 2024 at 6:07 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2024/8/19 23:54, Alexander Duyck wrote:
>
> ...
>
> >>>>
> >>>> "There are three types of API as proposed in this patchset instead of
> >>>> two types of API:
> >>>> 1. page_frag_alloc_va() returns [va].
> >>>> 2. page_frag_alloc_pg() returns [page, offset].
> >>>> 3. page_frag_alloc() returns [va] & [page, offset].
> >>>>
> >>>> You seemed to miss that we need a third naming for the type 3 API.
> >>>> Do you see type 3 API as a valid API? if yes, what naming are you
> >>>> suggesting for it? if no, why it is not a valid API?"
> >>>
> >>> I didn't. I just don't see the point in pushing out the existing API
> >>> to support that. In reality 2 and 3 are redundant. You probably only
> >>> need 3. Like I mentioned earlier you can essentially just pass a
> >>
> >> If the caller just expect [page, offset], do you expect the caller also
> >> type 3 API, which return both [va] and [page, offset]?
> >>
> >> I am not sure if I understand why you think 2 and 3 are redundant here?
> >> If you think 2 and 3 are redundant here, aren't 1 and 3 also redundant
> >> as the similar agrument?
> >
> > The big difference is the need to return page and offset. Basically to
> > support returning page and offset you need to pass at least one value
> > as a pointer so you can store the return there.
> >
> > The reason why 3 is just a redundant form of 2 is that you will
> > normally just be converting from a va to a page and offset so the va
> > should already be easily accessible.
>
> I am assuming that by 'easily accessible', you meant the 'va' can be
> calculated as below, right?
>
> va = encoded_page_address(encoded_va) +
> (page_frag_cache_page_size(encoded_va) - remaining);
>
> I guess it is easily accessible, but it is not without some overhead
> to calculate the 'va' here.
It is just the encoded_page_address + offset that you have to
calculate anyway. So the only bit you actually have to do is 2
instructions, one to mask the encoded_va and then the addition of the
offset that you provided to the page. As it stands those instruction
can easily be slipped in while you are working on converting the va to
a page.
> >
> >>> page_frag via pointer to the function. With that you could also look
> >>> at just returning a virtual address as well if you insist on having
> >>> something that returns all of the above. No point in having 2 and 3 be
> >>> seperate functions.
> >>
> >> Let's be more specific about what are your suggestion here: which way
> >> is the prefer way to return the virtual address. It seems there are two
> >> options:
> >>
> >> 1. Return the virtual address by function returning as below:
> >> void *page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio);
> >>
> >> 2. Return the virtual address by double pointer as below:
> >> int page_frag_alloc_bio(struct page_frag_cache *nc, struct bio_vec *bio,
> >> void **va);
> >
> > I was thinking more of option 1. Basically this is a superset of
> > page_frag_alloc_va that is also returning the page and offset via a
> > page frag. However instead of bio_vec I would be good with "struct
> > page_frag *" being the value passed to the function to play the role
> > of container. Basically the big difference between 1 and 2/3 if I am
> > not mistaken is the fact that for 1 you pass the size, whereas with
> > 2/3 you are peeling off the page frag from the larger page frag cache
>
> Let's be clear here: The callers just expecting [page, offset] also need
> to call type 3 API, which return both [va] and [page, offset]? and it
> is ok to ignore the overhead of calculating the 'va' for those kinds
> of callers just because we don't want to do the renaming for a existing
> API and can't come up with good naming for that?
>
> > after the fact via a commit type action.
>
> Just be clear here, there is no commit type action for some subtype of
> type 2/3 API.
>
> For example, for type 2 API in this patchset, it has below subtypes:
>
> subtype 1: it does not need a commit type action, it just return
> [page, offset] instead of page_frag_alloc_va() returning [va],
> and it does not return the allocated fragsz back to the caller
> as page_frag_alloc_va() does not too:
> struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
> unsigned int *offset, unsigned int fragsz,
> gfp_t gfp)
>
> subtype 2: it does need a commit type action, and @fragsz is returned to
> the caller and caller used that to commit how much fragsz to
> commit.
> struct page *page_frag_alloc_pg_prepare(struct page_frag_cache *nc,
> unsigned int *offset,
> unsigned int *fragsz, gfp_t gfp)
>
> Do you see subtype 1 as valid API? If no, why?
Not really, it is just a wrapper for page_frag_alloc that is
converting the virtual address to a page and offset. They are the same
data and don't justify the need for two functions. It kind of explains
one of the complaints I had about this code. Supposedly it was
refactoring and combining several different callers into one, but what
it is actually doing is fracturing the code path into 3 different
variants based on little if any actual difference as it is doing
unnecessary optimization.
> If yes, do you also expect the caller to use "struct page_frag *" as the
> container? If yes, what is the caller expected to do with the size field in
> "struct page_frag *" from API perspective? Just ignore it?
It should be populated. You passed a fragsz, so you should populate
the output fragsz so you can get the truesize in the case of network
packets. The removal of the page_frag from the other callers is making
it much harder to review your code anyway. If we keep the page_frag
there it should reduce the amount of change needed when you replace
page_frag with the page_frag_cache.
Honestly this is eating up too much of my time. As I said before this
patch set is too big and it is trying to squeeze in more than it
really should for a single patch set to be reviewable. Going forward
please split up the patch set as I had suggested before and address my
comments. Ideally you would have your first patch just be some
refactor and cleanup to get the "offset" pointer moving in the
direction you want. With that we can at least get half of this set
digested before we start chewing into all this refactor for the
replacement of page_frag with the page_frag_cache.
Powered by blists - more mailing lists