linux-kernel - Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yn5QHpc9YlAbP1li@alley>
Date:   Fri, 13 May 2022 14:33:34 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Song Liu <songliubraving@...com>
Cc:     Josh Poimboeuf <jpoimboe@...nel.org>, Rik van Riel <riel@...com>,
        "song@...nel.org" <song@...nel.org>,
        "joe.lawrence@...hat.com" <joe.lawrence@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "live-patching@...r.kernel.org" <live-patching@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jpoimboe@...hat.com" <jpoimboe@...hat.com>
Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched

On Wed 2022-05-11 16:33:57, Song Liu wrote:
> 
> 
> > On May 11, 2022, at 2:24 AM, Petr Mladek <pmladek@...e.com> wrote:
> > 
> > On Tue 2022-05-10 17:33:31, Josh Poimboeuf wrote:
> >> On Tue, May 10, 2022 at 11:57:04PM +0000, Song Liu wrote:
> >>>> If it's a real bug, we should fix it everywhere, not just for Facebook.
> >>>> Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class
> >>>> citizens.
> >>> 
> >>> I think "is it a real bug?" is the top question for me. So maybe we 
> >>> should take a step back.
> >>> 
> >>> The behavior we see is: A busy kernel thread blocks klp transition 
> >>> for more than a minute. But the transition eventually succeeded after 
> >>> < 10 retries on most systems. The kernel thread is well-behaved, as 
> >>> it calls cond_resched() at a reasonable frequency, so this is not a 
> >>> deadlock. 
> >>> 
> >>> If I understand Petr correctly, this behavior is expected, and thus 
> >>> is not a bug or issue for the livepatch subsystem. This is different
> >>> to our original expectation, but if this is what we agree on, we 
> >>> will look into ways to incorporate long wait time for patch 
> >>> transition in our automations. 
> >> 
> >> That's how we've traditionally looked at it, though apparently Red Hat
> >> and SUSE have implemented different ideas of what a long wait time is.
> >> 
> >> In practice, one minute has always been enough for all of kpatch's users
> >> -- AFAIK, everybody except SUSE -- up until now.
> > 
> > I am actually surprised that nobody met the problem yet. There are
> > "only" 60 attempts to transition the pending tasks.
> 
> Maybe we should consider increase the frequency we try? Say to 10 times
> per second? I guess this will solve most of the failures we are seeing
> in current case. 

My concern is that klp_try_complete_transition() checks all processes
under read_lock(&tasklist_lock). It might create some contention
on this lock. I am not sure if this lock is fair. It might slow down
block writers (creating/deleting tasks).

Best Regards,
Petr