[cpp-threads] Proposing a layered Thread API

Sat Sep 2 00:10:48 BST 2006

Hi,

After revising some implementations from Howard, Peter and others and 
just remembering some opinions from the Redmond meeting, I'm trying to 
think if a layered Thread/Task API could make everyone happy.

Some of the proposals are:

-> Some want reference-counted concurrently-joined future. Futures 
referring to the same asynchronous execution can be joined from several 
threads at the same time and each one gets a copy.

-> Some want a generic future, one that is independent from the 
executor. This allows a single future type even that function is being 
executed by a thread pool or just a simple OS thread. This makes the 
task system extendible with user-produced executors.

-> Some want an easy, efficient implementation, so that they don't want 
to pay for all that reference-counted, multiple-join overhead.

I think that we can get all those using a layered approach. This 
approach is not fully implemented, but I think is implementable. These 
are the levels:

------------------------------------------------------------
------------------------------------------------------------
Level 0: Direct OS thread management
------------------------------------------------------------
------------------------------------------------------------

-> thread<T>: A handle for the operating system thread, created by a
thread factory that can store settings (pthread_attr_t) and launch
threads.

-> Creating a thread<T> means creating an OS thread and joining it
means waiting for the OS thread termination.

-> The return value is _moved_ to the caller.

Example:

typedef std::vector<char> file_data_t;

file_data_t read_big_file(const char *file)

thread<file_data_t> t = launch_thread(bind(read_big_file, "myfile"));

//The whole vector is moved to the target, and no memory
//allocation is needed to obtain it
file_data_t data = t();

This level 0 thread is a _must_ because:

-> Offers portable basic operating system thread control. This is why we 
have so many C and C++ portable runtimes in many libraries (Apache, 
Mozilla, ACE, Qt...). So it's clear that some C++ programmers find this 
essential.

-> Many times I need an OS thread and we do care if the function is
being executed in another thread or not.

Example: I want to launch a GUI in another thread. That means that I 
need all POSIX thread guarantees (IO, signals, completion ports, 
asynchronous IO, etc...). I don't want the function to be blocked 
because a thread pool considers that there are too many active threads. 
I want to create a new OS thread and that's all. A generic future<T> is 
not the answer, because I want to make sure that when I call "join", 
that effectively means pthread_join().

-> Exceptions: Exceptions thrown by the thread can be propagated to the 
caller using Beman's virtual functions in std::exception:

virtual unique_ptr<std::exception> clone() = 0;
virtual void throw_self() = 0;

Note that we don't need to use full-clone semantics because we are going 
to destroy the launched exception anyway. We can use move semantics, so 
that we create a new exception but using the move constructor. When 
throwing self, instead of throwing a copy, we can throw a moved version.

virtual unique_ptr<std::exception> move_clone() = 0;
virtual void throw_moved_self() = 0;

-> Implementability: Already implemented by Howard.

------------------------------------------------------------
------------------------------------------------------------
Level 1: Asynchronous task handle: aka cheap future.
------------------------------------------------------------
------------------------------------------------------------

-> task<T>: A handle for an asynchronous task that will be executed in 
an executor (this can be a thread for each function, a thread pool, or
synchronous execution...).

-> This requires an standard interface for the implementation class, so 
that we can plug more executors in the framework.

-> task<T> is a unique_ptr for an asynchronous task and can be joined 
only once. This means that the return value is moved.

-> Movability is even more critical than in thread<T>, we might
be executing the task in an efficient thread-pool or just synchronously, 
and copy/mutex overhead might be noticeable, because we are not creating 
a thread for each task. The thread pool can be created using 
thread<void> abstraction, so that we can define our own portable executors.

-> We can just define an implementation interface so that the user can 
easily create its own executors:

template<class T>
class task_impl_interface
{
   virtual T join() = 0;
   virtual void request_cancel() = 0;
   //....
};

//The task is just a lightweight, movable-only holder
//of the implementation. Forwards all operations to
//the implementation.

template<class T>
class task
{
   //....
   std::unique_ptr<interface> result_;

   public:

   task(unique_ptr<task_impl_interface> impl)
      :   result_(impl)
   {}

   T operator()
   {  return result_->join();   }

   void request_kind_cancel()
   {  result_->request_kind_cancel();   }

  // ...
};

Deriving from this interface we can have different executors: one thread 
per function, a thread-pool, synchronous... This virtual interface is 
not the only approach. Peter's approach registering a function object is 
also valid. The idea is to have a single task<R> type for any executor, 
the virtual interface is just a way to get that, but Peter's approach 
has also many advantages, so we should study which approach is better.

Use cases:

task<T> t1 = create_task_in_a_thread(f);
task<T> t2 = create_task_in_a_thread_pool(f);
task<T> t3 = create_task_in_my_own_executor(f);

std::vector<task<T> > pending;
pending.push_back(std::move(t1));
pending.push_back(std::move(t2));
pending.push_back(std::move(t3));

//Task is movable only and can be placed in containers.
task<T> tnew = pending[0];  //compilation error
task<T> tnew(pending[0]);  //Ok, task moved from vector to tnew

//Task is a one-shot function call and moves the return value:
//T(T &&) is called
T a = tnew();

//This throws, because the task is empty
pending[0]();

-> The main reason for task<T> is that many times we don't share 
ownership of a future from different threads and we don't want to pay 
for a copy and a mutex lock each join. The mutex lock would be needed 
IMHO in a concurrently-joinable future because the copy constructor of a 
generic type does not need to be thread-safe at all: it might modify 
mutable members in the source object.

-> This scheme of a unique joiner is pretty widespread: we will surely 
have a "master" thread launching asynchronous operations, we will store 
them in a container and wait until one of them ends. A 
reference-counted, concurrently-joinable future is only needed if a 
group of threads launch asynchronous operations, passing them to other 
threads and operating on the the same future at the same time. With 
task<T> I can pass (using std::move) a task from a thread to another 
thread, but only call operator() once, and only from one thread.

-> Exceptions: The same as level 0.

-> Implementability: Easy. Peter has already implemented this approach.

------------------------------------------------------------
------------------------------------------------------------
Level 2: Future. Reference-counted value
------------------------------------------------------------
------------------------------------------------------------

-> future<T>: Basically the same as task<T> but can be freely copied 
between threads, and you get a copy for each join operation. The copy 
operation should be executed holding a mutex to guarantee that every 
class is correctly copied when multiple threads are calling join().

-> I'm still thinking if we can reuse the same executors as task<T> 
and/or convert a task<T> in future<T> so that a user can design an 
executor returning task<T> that can be used also to obtain futures.

-> Imagine that future<T> can be constructed from task<T> using move 
semantics:

template<class T>
class future
{
   //This future overrides task and
   //takes control of the operation
   future(task<T> &&t);
};

I see future<T> as the shared_ptr equivalent of an asynchronous task.
Converting from task<T> to future<T> can be seen as a conversion from
unique_ptr to shared_ptr:

unique_ptr<T> task( new T);

shared_ptr<T> future (task.release());

task<T> is emptied and morphs into a reference-counted, concurrently 
joinable, full-powered future<T>.

-> Implementability: I haven't tested this, but I don't see any big 
problem. The future holds a shared_ptr/intrusive_ptr to the task 
implementation and holds a lock when calling join(). The the real join 
value is first moved to a local storage and then copied for each join 
request. The same can be done with exceptions.

Thoughts? Do you see a 3 level approach correct? It's too complicated? 
Too hard to implement?

Regards,

Ion