[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABZAGREQwVUfRQkkTTXUYD_Uvkf0Wxa=dj7r_r85vMsTnUn67A@mail.gmail.com>
Date: Tue, 3 Feb 2026 19:03:09 +0800
From: Nick Huang <sef1548@...il.com>
To: David Laight <david.laight.linux@...il.com>
Cc: "Rafael J . Wysocki" <rafael@...nel.org>, Robert Moore <robert.moore@...el.com>,
Len Brown <lenb@...nel.org>, linux-acpi@...r.kernel.org, acpica-devel@...ts.linux.dev,
linux-kernel@...r.kernel.org, paladin@...b.edu.tw, kusogame68@...il.com,
ceyanglab@...il.com, n1136402@...b.edu.tw
Subject: Re: [PATCH 1/2] ACPI: nsrepair2: Replace O(n²) bubble sort with O(n log n) sort_r()
David Laight <david.laight.linux@...il.com> 於 2026年2月2日週一 上午6:48寫道:
>
> On Sun, 1 Feb 2026 13:03:33 +0000
> Nick Huang <sef1548@...il.com> wrote:
>
> > Replace the O(n²) bubble sort implementation in acpi_ns_sort_list()
> > with the kernel's sort_r() function which uses heapsort, providing
> > O(n log n) time complexity.
> >
> > This improves performance for large ACPI package lists while also
> > reducing code complexity by leveraging the existing kernel sort API.
>
> What is the break even size?
> While the heapsort is O(n long n) it is also more complicated.
> There is also the cost of the function call - especially with all the
> mitigations that distro kernels are likely to enable.
>
> For large datasets the d-cache locality of both sorts is particularly horrid.
> It is almost certainly better to allocate an array of index:value pairs
> and sort that.
> For very big datasets you want to sort small sections (that fit in the
> d-cache) and then use merge sorts (also O(n log n)) to combine them.
> (Yes - this is how you sort data with 3 mag-tape drives....)
>
> Oh, in any case, write separate functions for ascending/descending.
>
> David
>
Hi David
Thank you for your reply and the detailed feedback.
I ran KUnit benchmarks to investigate your questions.
=== Break-Even Point ===
I compared bubble sort vs heapsort across various array sizes:
N | Bubble(ns) | Heap(ns) | Faster
----|------------|----------|--------
2 | 61 | 99 | bubble
3 | 63 | 115 | bubble
4 | 78 | 163 | bubble
5 | 96 | 215 | bubble
6 | 119 | 260 | bubble
8 | 177 | 388 | bubble
10 | 257 | 539 | bubble
12 | 415 | 721 | bubble
16 | 726 | 1044 | bubble
20 | 1106 | 1484 | bubble
32 | 2854 | 3091 | bubble
Bubble sort is faster for all tested sizes. The break-even point was not
reached within N=32, which covers all realistic ACPI use cases:
- _ALR: 2-10 entries (ambient light response)
- _CST: 2-8 entries (C-states)
- _PSS: 5-20 entries (P-states)
- _TSS: 2-16 entries (T-states)
As you noted, heapsort has additional overhead from function calls and
mitigations that outweigh its O(n log n) advantage at small N.
=== Separate Ascending/Descending Functions ===
I also tested combined vs separate sort functions as you suggested:
N | Combined(ns) | Separate(ns) | Saved
----|--------------|--------------|------
4 | 84 | 75 | 11%
8 | 179 | 175 | 3%
16 | 737 | 806 | -9%
The results are mixed. Separate functions show marginal improvement at
small N, but the combined function performs better at N=16, possibly
due to instruction cache effects.
=== Conclusion ===
Given these results, replacing bubble sort with heapsort would likely
degrade performance for typical ACPI workloads. The existing bubble
sort implementation appears to be the right choice for this use case.
--
Regards,
Nick Huang
Powered by blists - more mailing lists