The physical architectures execute applications. Computer engineering is an art of building machines that match the applications. Such machines are efficient; that is, they provide high performance at low cost and power consumption. In case a computer system is built for concrete application, there is an opportunity to optimize the machine for the application.
For some applications, the high performance is still a paramount: the faster is better. Others, e.g. many embedded, applications are rather real-time. Given the time constraint, cost and energy consumption must be minimized. Many slow tasks can be implemented by a single universal processor. But such a processor emulates a model instead of executing it directly. Migrating many custom logic circuits into a CPU + cheap memory makes the computer cheap. But why it will be more energy efficient?
A hardwired logic might be represented by its data path graph. All the nodes compute at every clock cycle. The performance is huge. It might be pipelined for additional throughput. In contrast, the universal processor evaluates a node per clock cycle saving intermediate results in memory. This means that in addition to the datapath computing nodes that are contained in ALU, the processor includes complex instruction decoding and operation result handling logic. Despite the processor can still be tinier than a number of dedicated circuits, it seems that the overhead at execution time should make it less energy efficient. So, is everything right here? This question cycled in my head.
I have stated that general-purpose processors are used for slow tasks. Obviously, a dedicated circuit capable to respond in few clock cycles will do empty job most of the time in case it is needed at very low rate. I have just suddenly realized that this means that a dedicated datapath would waste energy most of the time. This is how the less efficient but doing useful job universal processor beats it!