lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96398f56-79a2-47b0-8d0f-41efaa9dc775@redhat.com>
Date: Fri, 14 Feb 2025 09:20:51 -0500
From: Waiman Long <llong@...hat.com>
To: Jin Guojie <guojie.jin@...il.com>, Waiman Long <llong@...hat.com>
Cc: Michal Koutný <mkoutny@...e.com>,
 cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] cgroup/cpuset: call fmeter_init() when
 cpuset.memory_pressure disabled

On 2/14/25 5:34 AM, Jin Guojie wrote:
> On Wed, Jan 15, 2025 at 11:38 PM Waiman Long <llong@...hat.com> wrote:
>> On 1/15/25 12:05 AM, Jin Guojie wrote:
>>> When running LTP's cpuset_memory_pressure program, the following error occurs:
>>>
>>> (1) Create a cgroup, enable cpuset subsystem, set memory limit, and
>>> then set cpuset_memory_pressure to 1
>>> (2) In this cgroup, create a process to allocate a large amount of
>>> memory and generate pressure counts
>>> (3) Set cpuset_memory_pressure to 0
>>> (4) Check cpuset.memory_pressure: LTP thinks it should be 0, but the
>>> current kernel returns a value of 1, so LTP determines it as FAIL
>>>
>>> V2:
>>> * call fmeter_init() when writing 0 to the memory_pressure_enabled
>>>
>>> Compared with patch v1 [1], this version implements clearer logic.
>>>
>>> [1] https://lore.kernel.org/cgroups/CA+B+MYRNsdKcYxC8kbyzVrdH9fT8c2if5UxGguKep36ZHe6HMQ@mail.gmail.com/T/#u
>>>
>>> Signed-off-by: Jin Guojie <guojie.jin@...il.com>
>>> Suggested-by: Michal Koutný <mkoutny@...e.com>
>>> Suggested-by: Waiman Long <longman@...hat.com>
>>> ---
>>>    kernel/cgroup/cpuset-v1.c | 4 +++-
>>>    kernel/cgroup/cpuset.c    | 2 ++
>>>    2 files changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>> index 25c1d7b77e2f..7520eb31598a 100644
>>> --- a/kernel/cgroup/cpuset-v1.c
>>> +++ b/kernel/cgroup/cpuset-v1.c
>>> @@ -66,7 +66,6 @@ void fmeter_init(struct fmeter *fmp)
>>>           fmp->cnt = 0;
>>>           fmp->val = 0;
>>>           fmp->time = 0;
>>> -       spin_lock_init(&fmp->lock);
>>>    }
>>>
>>>    /* Internal meter update - process cnt events and update value */
>>> @@ -437,6 +436,9 @@ static int cpuset_write_u64(struct
>>> cgroup_subsys_state *css, struct cftype *cft,
>>>                   break;
>>>           case FILE_MEMORY_PRESSURE_ENABLED:
>>>                   cpuset_memory_pressure_enabled = !!val;
>>> +               if (cpuset_memory_pressure_enabled == 0) {
>>> +                       fmeter_init(&cs->fmeter);
>>> +               }
>> Nit: you don't need parentheses when there is only one statement
>> underneath "if".
>>>                   break;
>>>           case FILE_SPREAD_PAGE:
>>>                   retval = cpuset_update_flag(CS_SPREAD_PAGE, cs, val);
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 0f910c828973..3583c898ff77 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -3378,6 +3378,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
>>>
>>>           __set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
>>>           fmeter_init(&cs->fmeter);
>>> +       spin_lock_init(&cs->fmeter.lock);
>>>           cs->relax_domain_level = -1;
>>>           INIT_LIST_HEAD(&cs->remote_sibling);
>>>
>>> @@ -3650,6 +3651,7 @@ int __init cpuset_init(void)
>>>           nodes_setall(top_cpuset.effective_mems);
>>>
>>>           fmeter_init(&top_cpuset.fmeter);
>>> +       spin_lock_init(&top_cpuset.fmeter.lock);
>>>           INIT_LIST_HEAD(&remote_children);
>>>
>>>           BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
>>> --
>>> 2.34.1
>>>
>> I just realize that cpuset.memory_pressure_enabled is on root cgroup
>> only and affect a global flag that impact the behavior of all the
>> existing cpusets. Your current patch will clear the memory pressure data
>> in the root cgroup only. The other child cpusets will not be affected
>> and will still show existing data. This inconsistency isn't good.
>>
>> OTOH, I also don't think iterating the whole cpuset hierarchy and
>> clearing all the fmeter data is worth the effort given that cgroup v1 is
>> in maintenance mode. Perhaps just a simple check to return 0 if
>> cpuset.memory_pressure_enabled isn't set like in the v1 patch. I also
>> don't think we need to clear the fmeter data in that case as it will
>> lead to data clearing only on cpusets where cpuset.memory_pressure is
>> read while cpuset.memory_pressure_enabled has been cleared.
>>
>> Cheers,
>> Longman
> Hi Waiman,
>
> I understand that your meaning is to avoid clearing the fmeter data.
> Based on this idea, I made some modifications based on patch v1.
>
> Instead of writing 0 to the member variables of fmeter, we only need
> to return 0 in fmeter_getrate(). The update logic for fmeter data is kept
> unchanged, which minimizes the impact on the kernel's current behavior.
>
> The whole patch is greatly simplified as follows.
> Please see if this idea is correct, thanks!
>
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 25c1d7b77e2f..14564e91e2b3 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -108,7 +108,7 @@ static int fmeter_getrate(struct fmeter *fmp)
>          fmeter_update(fmp);
>          val = fmp->val;
>          spin_unlock(&fmp->lock);
> -       return val;
> +       return cpuset_memory_pressure_enabled ? val : 0;
> }

That looks good to me. You can send out a v3 with this simpler change.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ