lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABdmKX1b2GjrUmgTEq+tgwdYyqp_2qhs1G5AHBeKCNSfdbO8Eg@mail.gmail.com>
Date: Thu, 1 Feb 2024 13:02:04 -0800
From: "T.J. Mercier" <tjmercier@...gle.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeelb@...gle.com>, 
	Muchun Song <muchun.song@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>, android-mm@...gle.com, 
	Minchan Kim <minchan@...gle.com>, cgroups@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg
 is disabled

On Thu, Feb 1, 2024 at 6:26 AM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Fri, Jan 26, 2024 at 09:19:25PM +0000, "T.J. Mercier" <tjmercier@...gle.com> wrote:
> > The root memcg is onlined even when memcg is disabled. When it's onlined
> > a 2 second periodic stat flush is started, but no stat flushing is
> > required when memcg is disabled because there can be no child memcgs.
> > Most calls to flush memcg stats are avoided when memcg is disabled as a
> > result of the mem_cgroup_disabled check added in 7d7ef0a4686a
> > ("mm: memcg: restore subtree stats flushing"), but the periodic flushing
> > started in mem_cgroup_css_online is not. Skip it.
>
> Have you tried
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6099,6 +6099,9 @@ int __init cgroup_init(void)
>         cgroup_unlock();
>
>         for_each_subsys(ss, ssid) {
> +               if (!cgroup_ssid_enabled(ssid))
> +                       continue;
> +
>                 if (ss->early_init) {
>                         struct cgroup_subsys_state *css =
>                                 init_css_set.subsys[ss->id];
> @@ -6118,9 +6121,6 @@ int __init cgroup_init(void)
>                  * disabled flag and cftype registration needs kmalloc,
>                  * both of which aren't available during early_init.
>                  */
> -               if (!cgroup_ssid_enabled(ssid))
> -                       continue;
> -
>                 if (cgroup1_ssid_disabled(ssid))
>                         pr_info("Disabling %s control group subsystem in v1 mounts\n",
>                                 ss->legacy_name);
> ?
> I'm asking about a try because I'm not sure whether this does not blow
> up due to something missing. But I think disabled controllers would not
> need to be (root-)onlined at all.
>
> Thanks,
> Michal

Hi Michal,

It does blow up, but not how I was expecting. There's a null pointer
dereference inside find_css_set when trying to get a css pointer for
the memory controller, I think because the allocation in
cgroup_init_subsys is skipped:

[    9.591766] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    9.591909] #PF: supervisor read access in kernel mode
[    9.592023] #PF: error_code(0x0000) - not-present page
[    9.592138] PGD 0 P4D 0
[    9.592199] Oops: 0000 [#1] PREEMPT SMP PTI
[    9.592289] CPU: 1 PID: 1 Comm: init Tainted: G            E
6.7.0-mainline-maybe-dirty #1
[    9.592490] Hardware name: emulation qemu-x86/qemu-x86, BIOS
2023.07-rc6-gb8315a3184b2-ab11347395 07/01/2023
[    9.592731] RIP: 0010:find_css_set+0x5c3/0x7a0
[    9.592850] Code: 20 69 cf b2 4e 8b 3c 33 48 c7 c7 1d 3b 41 b2 4c
89 fe e8 10 b0 f7 00 49 8b b4 24 a0 00 00 00 4e 8d 24 2b 49 81 c4 b0
fc ff ff <49> 8b 0f 4c 01 e9 48 c7 c7 c4 c0 46 b2 4c 89 e2 e8 e8 af f7
00 49
[    9.593251] RSP: 0018:ffffb6218000bb90 EFLAGS: 00010087
[    9.593370] RAX: 0000000000000021 RBX: ffff99181044a200 RCX: 4f5e789f089a0c00
[    9.593554] RDX: ffffb6218000ba50 RSI: ffffffffb2448284 RDI: ffffffffb2c91950
[    9.593735] RBP: ffffb6218000bc28 R08: 0000000000000fff R09: ffffffffb2c79950
[    9.593909] R10: 0000000000002ffd R11: 0000000000000004 R12: ffff99181044a2d8
[    9.594102] R13: 0000000000000428 R14: 0000000000000020 R15: 0000000000000000
[    9.594291] FS:  00007f3d2f986fc8(0000) GS:ffff99182bd00000(0000)
knlGS:0000000000000000
[    9.594467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.594610] CR2: 0000000000000000 CR3: 00000001001d8001 CR4: 0000000000370eb0
[    9.594818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.595007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    9.595178] Call Trace:
[    9.595237]  <TASK>
[    9.595297]  ? __die_body+0x5e/0xb0
[    9.595392]  ? __die+0x9e/0xb0
[    9.595481]  ? page_fault_oops+0x35f/0x3d0
[    9.595574]  ? do_user_addr_fault+0x6ab/0x780
[    9.595690]  ? prb_read_valid+0x28/0x60
[    9.595782]  ? exc_page_fault+0x83/0x220
[    9.595874]  ? asm_exc_page_fault+0x2b/0x30
[    9.595966]  ? find_css_set+0x5c3/0x7a0
[    9.596059]  cgroup_migrate_prepare_dst+0x75/0x2b0
[    9.596194]  cgroup_attach_task+0x293/0x450
[    9.596305]  ? cgroup_attach_task+0xb6/0x450
[    9.596449]  __cgroup_procs_write+0xef/0x1a0
[    9.596589]  cgroup_procs_write+0x16/0x30
[    9.596733]  cgroup_file_write+0x9d/0x260
[    9.596840]  kernfs_fop_write_iter+0x145/0x1e0
[    9.596981]  vfs_write+0x276/0x2e0
[    9.597092]  ksys_write+0x73/0xe0
[    9.597198]  __x64_sys_write+0x1a/0x30
[    9.597303]  do_syscall_64+0x5a/0x100
[    9.597430]  entry_SYSCALL_64_after_hwframe+0x6e/0x76


But there are also calls to mem_cgroup_css_from_folio that I think
would cause a different null pointer deref even if we had the css but
no root_mem_cgroup. There seems to be an assumption that we'll have a
memcg to charge against even when the controller is disabled. To me it
looks like that's to simplify the possible combinations of
CONFIG_MEMCG and memcg being boot-time disabled or not.

Best,
T.J.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ