lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c08662f3-6ae1-4fb5-1c4f-840a70fad035@redhat.com>
Date:   Thu, 4 Mar 2021 18:23:09 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, joaodias@...gle.com
Subject: Re: [PATCH] mm: be more verbose for alloc_contig_range faliures

>> You want to debug something, so you try triggering it and capturing debug
>> data. There are not that many alloc_contig_range() users such that this
>> would really be an issue to isolate ...
> 
> cma_alloc uses alloc_contig_range and cma_alloc has lots of users.
> Even, it is expoerted by dmabuf so any userspace would trigger the
> allocation by their own. Some of them could be tolerant for the failure,
> rest of them could be critical. We should't expect it by limited kernel
> usecase.

Assume you are debugging allocation failures. You either collect the 
data yourself or ask someone to send you that output. You care about any 
alloc_contig_range() allocation failures that shouldn't happen, don't you?

> 
>>
>> Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is
>> problematic (putting aside NORETRY logic and similar aside). So any such
>> page you hit is worth investigating and, therefore, worth getting logged for
>> debugging purposes.
> 
> If you believe the every alloc_contig_range failure is problematic

Every one where we should have guarantees I guess: ZONE_MOVABLE or 
MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees.

> and there is no such realy example I menionted above in the world,
> I am happy to put this chunk to support dynamic debugging.
> Okay?
> 
> +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> +        (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
> +               DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
> +int alloc_contig_ratelimit(void)
> +{
> +       return __ratelimit(&alloc_contig_ratelimit_state);
> +}
> +

^ do we need ratelimiting with dynamic debugging enabled?

> +void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> +       DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> +                       "migrate failure");
> +       if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> +                       alloc_contig_ratelimit()) {
> +               struct page *page;
> +
> +               WARN(1, "failed callstack");
> +               list_for_each_entry(page, page_list, lru)
> +                       dump_page(page, "migration failure");

Are all pages on the list guaranteed to be problematic, or only the 
first entry? I assume all.

> +       }
> +}
> +#else
> +static inline void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> +}
> +#endif
> +
>   /* [start, end) must belong to a single zone. */
>   static int __alloc_contig_migrate_range(struct compact_control *cc,
>                                          unsigned long start, unsigned long end)
> @@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>                                  NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>          }
>          if (ret < 0) {
> +               dump_migrate_failure_pages(&cc->migratepages);
>                  putback_movable_pages(&cc->migratepages);
>                  return ret;
>          }
> 
> 

If that's the way dynamic debugging is configured/enabled (still have to 
look into it) - yes, that goes into the right direction. As I said 
above, you should dump only where we have some kind of guarantees I assume.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ