netdev - Re: [PATCH net-next v3 3/3] page_pool: fix IOMMU crash when driver has already unbound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0c146fb8-4c95-4832-941f-dfc3a465cf91@kernel.org>
Date: Fri, 25 Oct 2024 16:07:38 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
 Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net, kuba@...nel.org,
 pabeni@...hat.com
Cc: zhangkun09@...wei.com, fanghaiqing@...wei.com, liuyonglong@...wei.com,
 Robin Murphy <robin.murphy@....com>,
 Alexander Duyck <alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>, Eric Dumazet
 <edumazet@...gle.com>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
 kernel-team <kernel-team@...udflare.com>
Subject: Re: [PATCH net-next v3 3/3] page_pool: fix IOMMU crash when driver
 has already unbound




On 25/10/2024 13.16, Toke Høiland-Jørgensen wrote:
> Yunsheng Lin <linyunsheng@...wei.com> writes:
> 
>> On 2024/10/24 22:40, Toke Høiland-Jørgensen wrote:
>>
>> ...
>>
>>>>>
>>>>> I really really dislike this approach!
>>>>>
>>>>> Nacked-by: Jesper Dangaard Brouer <hawk@...nel.org>
>>>>>
>>>>> Having to keep an array to record all the pages including the ones
>>>>> which are handed over to network stack, goes against the very principle
>>>>> behind page_pool. We added members to struct page, such that pages could
>>>>> be "outstanding".
>>>>
>>>> Before and after this patch both support "outstanding", the difference is
>>>> how many "outstanding" pages do they support.
>>>>
>>>> The question seems to be do we really need unlimited inflight page for
>>>> page_pool to work as mentioned in [1]?
>>>>
>>>> 1. https://lore.kernel.org/all/5d9ea7bd-67bb-4a9d-a120-c8f290c31a47@huawei.com/
>>>
>>> Well, yes? Imposing an arbitrary limit on the number of in-flight
>>> packets (especially such a low one as in this series) is a complete
>>> non-starter. Servers have hundreds of gigs of memory these days, and if
>>> someone wants to use that for storing in-flight packets, the kernel
>>> definitely shouldn't impose some (hard-coded!) limit on that.
>>

I agree this limit is a non-starter.

>> You and Jesper seems to be mentioning a possible fact that there might
>> be 'hundreds of gigs of memory' needed for inflight pages, it would be nice
>> to provide more info or reasoning above why 'hundreds of gigs of memory' is
>> needed here so that we don't do a over-designed thing to support recording
>> unlimited in-flight pages if the driver unbound stalling turns out impossible
>> and the inflight pages do need to be recorded.
> 
> I don't have a concrete example of a use that will blow the limit you
> are setting (but maybe Jesper does), I am simply objecting to the
> arbitrary imposing of any limit at all. It smells a lot of "640k ought
> to be enough for anyone".
> 

As I wrote before. In *production* I'm seeing TCP memory reach 24 GiB
(on machines with 384GiB memory). I have attached a grafana screenshot
to prove what I'm saying.

As my co-worker Mike Freemon, have explain to me (and more details in
blogposts[1]). It is no coincident that graph have a strange "sealing"
close to 24 GiB (on machines with 384GiB total memory).  This is because
TCP network stack goes into a memory "under pressure" state when 6.25%
of total memory is used by TCP-stack. (Detail: The system will stay in
that mode until allocated TCP memory falls below 4.68% of total memory).

  [1] 
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/


>> I guess it is common sense to start with easy one until someone complains
>> with some testcase and detailed reasoning if we need to go the hard way as
>> you and Jesper are also prefering waiting over having to record the inflight
>> pages.
> 
> AFAIU Jakub's comment on his RFC patch for waiting, he was suggesting
> exactly this: Add the wait, and see if the cases where it can stall turn
> out to be problems in practice.

+1

I like Jakub's approach.

--Jesper
Download attachment "Screenshot from 2024-10-25
 15-47-04.png" of type "image/png" (1137103 bytes)