Saturday, January 12, 2013

Device Interface for Chinig and Flexible Data Pumping

Thinking about proper interface for the mad mp3 library, I have come to the following generalization. So, I'm discussing chaining of programmatic devices.

There are three types of devices: data producers, processors and consumers. You chain them together to produce some output. The input and output devices can be network sockets, files and audio devices, for instance. The examples of processors are decoders. Suppose, you get mp3 input from a file/network, decode it and output to audio card. Input devices have read method, output devs have write. It is reasonable, therefore, to have both on intermediate devices. Modern audio libraries are so sophisticated that I believe you could use one for any processing of any data, using their plug-in mechanism. I propose a simple one.

Mad library provides you with a mp3 decoder. Before using it, you initialize the lib by providing input, output, error and other callbacks to it. To decode a file, I suppose, you use a thread to call decode() method which will invoke in callback to get the data from the input and out callback to write the result to output. This is not convenient for chining. Suppose, we read the data from another decoder or encoder of similar kind. The only way to run both would be to use two pipelined threads. However, there is no reason add complexity and load processor with multiple contexts and insidious synchronization. There is no need for more threads (virtual processors) on a computer that the number of real processors.

The interface convenient for chining, which can additionally be run by a single thread would be:
input.read -> filter1 -> filter2 -> ... -> output.write .

The filters have read and write methods. You perform the data pumping by spinning with your thread in one of the links reading from input and writing to the output. For instance:

//prepare the chain
createChain();

// choose a link to drive
in = input;
out = in.next;

// pump
int size;
while ((size = in.read(buf, BUF_SIZE)) != 0)
out.write(buf, size);

Writing into an intermediate node, pushes data into it. It makes some processing (it is useless otherwise) and writes the data into its next node. Reading the data from intermediate node would cause read callbacks in the object, invoking the read method of its predecessor node in the chain. In this uniform way, we can drive any data chains by a single thread. The interface would be blocking for this reason and simplicity. I call the nodes 'devices'. All the communication devices already implement the read and write methods.

Nothing prevents you to use more threads with pipeline nodes to enable more threads for multiprocessor computers. The architecture can be further optimized by hiding all the chain implementation code into an io library. User just builds a chain of required devices and invokes its pump() method. It will exit on EOF at the input.

This approach is also universal being not addressed/intended for specific language/platform. I admit there are more convenient interface standards and going to investigate this. For instance, Java allows for uniform chaining with their InputStreamFilter interface. This is good, since it is a standard, the uniform interface which simplifies the life preventing loads of useless work. You read data from input chain and feed them into an output chain. You are constrained to use only one point for pumping the data. A read on the input stream will call the underlying stream, which will call the underlying stream, etc. This approach is simple but limited. A simple example consider a BufferStream. In Java, you have two objects depending on whether you need to buffer the input or output. And you should have two objects for everything. Otherwise, you may not be able to build a chain in some situations. For instance, there you may be forced to buffer output feeding it to mp3 decoder, while there is only mp3 decoder input stream which exists.

My approach: a filter object with input and output interface. You build a chain by connecting output of preceding node with input for the next.

filter = new Filter();
filter.setOutput(someNode.input)

Of course, there should be convince constructors, like in java allowing one-statement chining:
Filter (predcessor.out) -- creates a filter, which input is linked to the pred output
Filter (follower.in) -- creates a filter, which output is linked
Filter (predcessor.in, follower.out) -- creates completely linked filter


My approach enforces existence of both the the input and output interfaces on the plug-ins. On the other hand, it is not hard to create at output stream given an input stream and vice versa. So when You have a decoder, from which you can only read decoded PCM, you can easily create an output stream, which decodes your written data and forwards it into underlying stream. Am I wrong?


In this discussion, Processor, Filter and intermediate note are synonims.

No comments: