lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Sep 2021 22:29:21 +0800
From:   "brookxu.cn" <brookxu.cn@...il.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Vipin Sharma <vipinsh@...gle.com>, tj@...nel.org,
        lizefan.x@...edance.com, hannes@...xchg.org,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

Thanks for your time.

On 2021/9/10 5:23 PM, Michal Koutný wrote:
> On Fri, Sep 10, 2021 at 01:30:46PM +0800, brookxu <brookxu.cn@...il.com> wrote:
>> I am a bit confused here. For misc_cgroup, we can only be rejected when the count
>> touch Limit, but there may be other more reasons for other subsystems.
> 
> Sorry, I wasn't clear about that -- the failures I meant to be counted
> here were only the ones caused by (an ancestor) limit. Maybe there's a
> better naem for that.
> 
>> Therefore, when we are rejected, does it mean that we have touch
>> Limit? If so, do we still need to distinguish between max and fail?
>> (for misc_cgroup)
> 
> r
> `- c1
>     `- c2.max
>         `- c3
>            `- c4.max
> 	     `- task t
>            `- c5
> 
> Assuming c2.max < c4.max, when a task t calls try_charge and it fails
> because of c2.max, then the 'max' event is counted to c2 (telling that
> the limit is perhaps low) and the 'fail' event is counted to c4 (telling
> you where the troubles originated). That is my idea. Although in the
> case of short-lived cgroups, you'd likely only get the hierarchically
> aggregated 'fail' events from c3 or higher with lower (spatial)
> precision.
> What would be the type of information useful for your troubleshooting?

Through events and events.local, we can determine which node has 
insufficient resources. For example, when the ‘events’ is large, we 
traverse down and use events.local to determine which node has 
insufficient resources. 'fail' counter does not seem to provide more 
effective information in this regard. When 'fail' is big, it seems that 
we still need to use events and events.local to determine the node of 
insufficient resources. I am not very sure what details can we learn 
through 'fail' counter.

> 
> Cheers,
> Michal
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ