lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e45a2f2-6dd5-5a43-c1a0-7520c1ed2675@opensource.wdc.com>
Date:   Tue, 15 Nov 2022 13:28:14 +0900
From:   Damien Le Moal <damien.lemoal@...nsource.wdc.com>
To:     Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Conor Dooley <conor@...nel.org>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Christoph Lameter <cl@...ux.com>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Pekka Enberg <penberg@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Rustam Kovhaev <rkovhaev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Josh Triplett <josh@...htriplett.org>,
        Arnd Bergmann <arnd@...db.de>,
        Russell King <linux@...linux.org.uk>,
        Alexander Shiyan <shc_work@...l.ru>,
        Aaro Koskinen <aaro.koskinen@....fi>,
        Janusz Krzysztofik <jmkrzyszt@...il.com>,
        Tony Lindgren <tony@...mide.com>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>,
        Rich Felker <dalias@...c.org>, Jonas Bonn <jonas@...thpole.se>,
        Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>,
        Stafford Horne <shorne@...il.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        openrisc@...ts.librecores.org, linux-riscv@...ts.infradead.org,
        linux-sh@...r.kernel.org,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Conor.Dooley@...rochip.com, Paul Cercueil <paul@...pouillou.net>
Subject: Re: Deprecating and removing SLOB

On 11/15/22 13:24, Damien Le Moal wrote:
> On 11/14/22 23:47, Hyeonggon Yoo wrote:
>> On Mon, Nov 14, 2022 at 08:35:31PM +0900, Damien Le Moal wrote:
>>> On 11/14/22 18:36, Vlastimil Babka wrote:
>>>> On 11/14/22 06:48, Damien Le Moal wrote:
>>>>> On 11/14/22 10:55, Damien Le Moal wrote:
>>>>>> On 11/12/22 05:46, Conor Dooley wrote:
>>>>>>> On Fri, Nov 11, 2022 at 11:33:30AM +0100, Vlastimil Babka wrote:
>>>>>>>> On 11/8/22 22:44, Pasha Tatashin wrote:
>>>>>>>>> On Tue, Nov 8, 2022 at 10:55 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> as we all know, we currently have three slab allocators. As we discussed
>>>>>>>>>> at LPC [1], it is my hope that one of these allocators has a future, and
>>>>>>>>>> two of them do not.
>>>>>>>>>>
>>>>>>>>>> The unsurprising reasons include code maintenance burden, other features
>>>>>>>>>> compatible with only a subset of allocators (or more effort spent on the
>>>>>>>>>> features), blocking API improvements (more on that below), and my
>>>>>>>>>> inability to pronounce SLAB and SLUB in a properly distinguishable way,
>>>>>>>>>> without resorting to spelling out the letters.
>>>>>>>>>>
>>>>>>>>>> I think (but may be proven wrong) that SLOB is the easier target of the
>>>>>>>>>> two to be removed, so I'd like to focus on it first.
>>>>>>>>>>
>>>>>>>>>> I believe SLOB can be removed because:
>>>>>>>>>>
>>>>>>>>>> - AFAIK nobody really uses it? It strives for minimal memory footprint
>>>>>>>>>> by putting all objects together, which has its CPU performance costs
>>>>>>>>>> (locking, lack of percpu caching, searching for free space...). I'm not
>>>>>>>>>> aware of any "tiny linux" deployment that opts for this. For example,
>>>>>>>>>> OpenWRT seems to use SLUB and the devices these days have e.g. 128MB
>>>>>>>>>> RAM, not up to 16 MB anymore. I've heard anecdotes that the performance
>>>>>>>>>> SLOB impact is too much for those who tried. Googling for
>>>>>>>>>> "CONFIG_SLOB=y" yielded nothing useful.
>>>>>>>>>
>>>>>>>>> I am all for removing SLOB.
>>>>>>>>>
>>>>>>>>> There are some devices with configs where SLOB is enabled by default.
>>>>>>>>> Perhaps, the owners/maintainers of those devices/configs should be
>>>>>>>>> included into this thread:
>>>>>>>>>
>>>>>>>>> tatashin@...een:~/x/linux$ git grep SLOB=y
>>>>>>>
>>>>>>>>> arch/riscv/configs/nommu_k210_defconfig:CONFIG_SLOB=y
>>>>>>>>> arch/riscv/configs/nommu_k210_sdcard_defconfig:CONFIG_SLOB=y
>>>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_SLOB=y
>>>>>>>
>>>>>>>>
>>>>>>>> Turns out that since SLOB depends on EXPERT, many of those lack it so
>>>>>>>> running make defconfig ends up with SLUB anyway, unless I miss something.
>>>>>>>> Only a subset has both SLOB and EXPERT:
>>>>>>>>
>>>>>>>>> git grep CONFIG_EXPERT `git grep -l "CONFIG_SLOB=y"`
>>>>>>>
>>>>>>>> arch/riscv/configs/nommu_virt_defconfig:CONFIG_EXPERT=y
>>>>>>>
>>>>>>> I suppose there's not really a concern with the virt defconfig, but I
>>>>>>> did check the output of `make nommu_k210_defconfig" and despite not
>>>>>>> having expert it seems to end up CONFIG_SLOB=y in the generated .config.
>>>>>>>
>>>>>>> I do have a board with a k210 so I checked with s/SLOB/SLUB and it still
>>>>>>> boots etc, but I have no workloads or w/e to run on it.
>>>>>>
>>>>>> I sent a patch to change the k210 defconfig to using SLUB. However...
>>>>
>>>> Thanks!
>>>>
>>>>>> The current default config using SLOB gives about 630 free memory pages
>>>>>> after boot (cat /proc/vmstat). Switching to SLUB, this is down to about
>>>>>> 400 free memory pages (CONFIG_SLUB_CPU_PARTIAL is off).
>>>>
>>>> Thanks for the testing! How much RAM does the system have btw? I found 8MB
>>>> somewhere, is that correct?
>>>
>>> Yep, 8MB, that's it.
>>>
>>>> So 230 pages that's a ~920 kB difference. Last time we saw less  dramatic
>>>> difference [1]. But that was looking at Slab pages, not free pages. The
>>>> extra overhead could be also in percpu allocations, code etc.
>>>>
>>>>>> This is with a buildroot kernel 5.19 build including a shell and sd-card
>>>>>> boot. With SLUB, I get clean boots and a shell prompt as expected. But I
>>>>>> definitely see more errors with shell commands failing due to allocation
>>>>>> failures for the shell process fork. So as far as the K210 is concerned,
>>>>>> switching to SLUB is not ideal.
>>>>>>
>>>>>> I would not want to hold on kernel mm improvements because of this toy
>>>>>> k210 though, so I am not going to prevent SLOB deprecation. I just wish
>>>>>> SLUB itself used less memory :)
>>>>>
>>>>> Did further tests with kernel 6.0.1:
>>>>> * SLOB: 630 free pages after boot, shell working (occasional shell fork
>>>>> failure happen though)
>>>>> * SLAB: getting memory allocation for order 7 failures on boot already
>>>>> (init process). Shell barely working (high frequency of shell command fork
>>>>> failures)
>>>
>>> I forgot to add here that the system was down to about 500 free pages
>>> after boot (again from the shell with "cat /proc/vmstat").
>>>
>>>>> * SLUB: getting memory allocation for order 7 failures on boot. I do get a
>>>>> shell prompt but cannot run any shell command that involves forking a new
>>>>> process.
>>>
>>> For both slab and slub, I had cpu partial off, debug off and slab merge
>>> on, as I suspected that would lead to less memory overhead.
>>> I suspected memory fragmentation may be an issue but doing
>>>
>>> echo 3 > /proc/sys/vm/drop_caches
>>>
>>> before trying a shell command did not help much at all (it usually does on
>>> that board with SLOB). Note that this is all with buildroot, so this echo
>>> & redirect always works as it does not cause a shell fork.
>>>
>>>>>
>>>>> So if we want to keep the k210 support functional with a shell, we need
>>>>> slob. If we reduce that board support to only one application started as
>>>>> the init process, then I guess anything is OK.
>>>>
>>>> In [1] it was possible to save some more memory with more tuning. Some of
>>>> that required boot parameters and other code changes. In another reply [2] I
>>>> considered adding something like SLUB_TINY to take care of all that, so
>>>> looks like it would make sense to proceed with that.
>>>
>>> If you want me to test something, let me know.
>>
>> Would you try this please?
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index a24b71041b26..1c36c4b9aaa0 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -4367,9 +4367,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>>  	 * The larger the object size is, the more slabs we want on the partial
>>  	 * list to avoid pounding the page allocator excessively.
>>  	 */
>> -	s->min_partial = min_t(unsigned long, MAX_PARTIAL, ilog2(s->size) / 2);
>> -	s->min_partial = max_t(unsigned long, MIN_PARTIAL, s->min_partial);
>> -
>> +	s->min_partial = 0;
>>  	set_cpu_partial(s);
>>  
>>  #ifdef CONFIG_NUMA
>>
>>
>> and booting with and without boot parameter slub_max_order=0?
> 
> Test notes: I used Linus 6.1-rc5 as the base. That is the only thing I
> changed in buildroot default config for the sipeed maix bit card, booting
> with SD card. The test is: booting and run "cat /proc/vmstat" and register
> the nr_free_pages value. I repeated the boot + cat 3 to 4 times for each case.
> 
> Here are the results:
> 
> 6.1-rc5, SLOB:
>     - 623 free pages
>     - 629 free pages
>     - 629 free pages
> 6.1-rc5, SLUB:
>     - 448 free pages
>     - 448 free pages
>     - 429 free pages
> 6.1-rc5, SLUB + slub_max_order=0:
>     - Init error, shell prompt but no shell command working
>     - Init error, no shell prompt
>     - 508 free pages
>     - Init error, shell prompt but no shell command working
> 6.1-rc5, SLUB + patch:
>     - Init error, shell prompt but no shell command working
>     - 433 free pages
>     - 448 free pages
>     - 423 free pages
> 6.1-rc5, SLUB + slub_max_order=0 + patch:
>     - Init error, no shell prompt
>     - Init error, shell prompt, 499 free pages
>     - Init error, shell prompt but no shell command working
>     - Init error, no shell prompt
> 
> No changes for SLOB results, expected.
> 
> For default SLUB, I did get all clean boots this time and could run the
> cat command. But I do see shell fork failures if I keep running commands.
> 
> For SLUB + slub_max_order=0, I only got one clean boot with 508 free
> pages. Remaining runs failed to give a shell prompt or allow running cat
> command. For the clean boot, I do see higher number of free pages.
> 
> SLUB with the patch was nearly identical to SLUB without the patch.
> 
> And SLUB+patch+slub_max_order=0 gave again a lot of errors/bad boot. I
> could run the cat command only once, giving 499 free pages, so better than
> regular SLUB. But it seems that the memory is more fragmented as
> allocations fail more often.

Note about the last case (SLUB+patch+slub_max_order=0). Here are the
messages I got when the init shell process fork failed:

[    1.217998] nommu: Allocation of length 491520 from process 1 (sh) failed
[    1.224098] active_anon:0 inactive_anon:0 isolated_anon:0
[    1.224098]  active_file:5 inactive_file:12 isolated_file:0
[    1.224098]  unevictable:0 dirty:0 writeback:0
[    1.224098]  slab_reclaimable:38 slab_unreclaimable:459
[    1.224098]  mapped:0 shmem:0 pagetables:0
[    1.224098]  sec_pagetables:0 bounce:0
[    1.224098]  kernel_misc_reclaimable:0
[    1.224098]  free:859 free_pcp:0 free_cma:0
[    1.260419] Node 0 active_anon:0kB inactive_anon:0kB active_file:20kB
inactive_file:48kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB
kernel_stack:576kB pagetables:0kB sec_pagetables:0kB all_unreclaimable? no
[    1.285147] DMA32 free:3436kB boost:0kB min:312kB low:388kB high:464kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:28kB unevictable:0kB writepending:0kB present:8192kB
managed:6240kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[    1.310654] lowmem_reserve[]: 0 0 0
[    1.314089] DMA32: 17*4kB (U) 10*8kB (U) 7*16kB (U) 6*32kB (U) 11*64kB
(U) 6*128kB (U) 6*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3460kB
[    1.326883] 33 total pagecache pages
[    1.330420] binfmt_flat: Unable to allocate RAM for process text/data,
errno -12
[    1.337858] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b


-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ