linux-kernel - Re: [patch 1/2]percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20140214103618.GA8584@dhcp-26-207.brq.redhat.com>
Date:	Fri, 14 Feb 2014 11:36:18 +0100
From:	Alexander Gordeev <agordeev@...hat.com>
To:	Jens Axboe <axboe@...nel.dk>
Cc:	Kent Overstreet <kmo@...erainc.com>,
	Christoph Hellwig <hch@...radead.org>,
	Shaohua Li <shli@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [patch 1/2]percpu_ida: fix a live lock

On Mon, Feb 10, 2014 at 04:06:27PM -0700, Jens Axboe wrote:
> It obviously all depends on the access pattern. X threads for X tags
> would work perfectly well with per-cpu tagging, if they are doing
> sync IO. And similarly, 8 threads each having low queue depth would
> be fine. However, it all falls apart pretty quickly if threads*qd >
> tag space.

What about inroducing two modes: gentle and agressive?

Gentle would be something implemented now with occasional switches into
agressive mode and back (more about this below).

In agressive mode percpu_ida_alloc() would ignore nr_cpus/2 threshold
and rob remote caches of tags whenever available. Further, percpu_ida_free()
would wake waiters always, even under percpu_max_size limit.

In essence, the agressive mode would be used to overcome the tag space
fragmentation (garbage collector, if you want), but not only. Some classes
of devices would initialize it constantly - that would be the case for i.e.
libata, paired with TASK_RUNNING condition.

Most challenging question here is when to switch from gentle mode to
agressive and back. I am thinking about a timeout that indicates a
reasonable average time for an IO to complete:

  * Once CPU failed to obtain a tag and cpus_have_tags is not empty and the
    timeout expired, the agressive mode is enforced, meaning no IO activity
    on other CPUs and tags are out there;

  * Once no tags were requested and the timeout expired the agressive mode
    is switched back to gentle;

  * Once the second tag returned to a local cache the agressive mode is
    switched back to gentle, meaning no threads were/are in the waitqueue,
    hungry for tags;

The value of timeout is calculated based on the recent IO activity, probably
with some hints from the device driver. The good thing here that occasional
unnecessary switches to the agressive mode would not hit the overall
performance.

Quite complicated, huh? :)

-- 
Regards,
Alexander Gordeev
agordeev@...hat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/