lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40618da9.b062.196ca805193.Coremail.00107082@163.com>
Date: Wed, 14 May 2025 00:35:29 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Alan Stern" <stern@...land.harvard.edu>
Cc: mathias.nyman@...el.com, gregkh@...uxfoundation.org, oneukum@...e.com,
	linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] USB: core: add a memory pool to urb for
 host-controller private data


At 2025-05-13 23:37:20, "Alan Stern" <stern@...land.harvard.edu> wrote:
>On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote:
>> 
>> 
>> At 2025-05-13 22:25:50, "Alan Stern" <stern@...land.harvard.edu> wrote:
>> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
>> >> ---
>> >> Changes:
>> >> 1. Use caller's mem_flags if a larger memory is needed.
>> >> Thanks to Oliver Neukum <oneukum@...e.com>'s review.
>> >> ---
>> >> URB objects have long lifecycle, an urb can be reused between
>> >> enqueue-dequeue loops; The private data needed by some host controller
>> >> has very short lifecycle, the memory is alloced when enqueue, and
>> >> released when dequeue. For example, on a system with xhci, several
>> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>> >>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>> >>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> >> Memory allocation frequency for host-controller private data can reach
>> >> ~1k/s and above.
>> >> 
>> >> High frequent allocations for host-controller private data can be
>> >> avoided if urb take over the ownership of memory, the memory then shares
>> >> the longer lifecycle with urb objects.
>> >> 
>> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block
>> >> of memory and grows when larger size is requested.
>> >> 
>> >> Signed-off-by: David Wang <00107082@....com>
>> >
>> >It should be possible to do what you want without touching the USB core 
>> >code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
>> >
>> >Instead of having an URB keep ownership of its extra memory after it 
>> >completes, xhci-hcd can put the memory area onto a free list.  Then 
>> >memory areas on the free list can be reused with almost no overhead when 
>> >URBs are enqueued later on.
>> 
>> I have to disagree,  your suggestion has no much difference from requesting memory from
>> system, locks and memory pool managements, all the same are needed, why bother?
>
>There are two differences.  First, xhci-hcd already has its own lock 
>that it acquires when enqueuing or dequeuing URBs, so no additional 
>locks would be needed.  Second, complicated memory pool management isn't 
>necessary; the management can be extremely simple.  (For example, 
>Mathias suggested just reusing the most recently released memory area 
>unless it is too small.)

My patch also just reuse memory,  in a simpler way I would argue....

>
>> The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
>> and no two HCD can take over the same URB as the same time.
>> 
>> I think the memory pool here is not actually a pool,  or I should say the mempool consists of
>> pool of URBs, and each URB have only "single one" slot of mem pool in it.
>> 
>> 
>> >
>> >This idea can become more elaborate if you maintain separate free lists 
>> >for different devices, or even for different endpoints, or sort the free 
>> >list by the size of the memory areas.  But the basic idea is always the 
>> >same: Don't change usbcore (including struct urb), and make getting and 
>> >releasing the extra memory areas have extremely low overhead.
>> 
>> Why implements a device level memory pool would have extremely low overhead?
>
>Because then getting or releasing memory areas from the pool could be 
>carried out just by manipulating a couple of pointers.

A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?  I doubt that.
There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
or it is just that in my mind, make changes to "usb core" is a big no-no!

>
>Some fraction of the time, URBs are created on demand and destroyed upon 
>completion.  Your approach would not save any time for these URBs, 
>whereas my suggested approach would.  (This fraction probably isn't very 
>large, although I don't know how big it is.)

I am aiming to void extra tons of alloctions for "private data",  not URB allocation.
When I use my USB webcam, I would observer 1k+ allocation per seconds for private data, while
 urb allocation's frequency is already very low,  people have already done the  optimization
I guess.   The memory profiling feature introduced in 6.12 is a very good start for identifying
where improvement could be made for memory behavior. 

>
>> Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
>> host controllers, xhci is what I have in my system; i think similar changes would benefit other
>> HCs
>
>Those other HC drivers would still require changes.  They could be 
>changed to support my scheme as easily as your scheme.

>
>If I were redesigning Linux's entire USB stack from the beginning, I 
>would change it so that URBs would be dedicated to particular host 
>controllers and endpoint types -- maybe even to particular endpoints.  
>They would contain all the additional memory required for the HCD to use 
>them, all pre-allocated, so that dynamic allocation would not be needed 
>during normal use.  (The gadget subsystem behaves this way already.)
>
>Such a drastic change isn't feasible at this point, although what you 
>are suggesting is a step in that direction.  In the end it comes down 
>to a time/space tradeoff, and it's very difficult to know what the best 
>balance is.
Drastic change? Do you mean make change to USB core? Because counting by
lines of changes in this patch, I feel my patch is quite simple and efficient.
I also don't get your time/space tradeoff here,  are you talking about my patch?
or you were just talking a solution in your mind....
This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...

>
>I'm not saying that your approach is bad or wrong, just that there are 
>other possibilities to consider.
>
>Alan Stern
>
>Alan Stern


David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ