Thursday, October 6, 2011

Serializing optimizer for massive parallelism

Carry chain is not always better than parallelism I show that XST recognizes a vector comparator in logic. Compared corresponding bits, we must AND-reduce them to see if there is any 0 (mismatch). The reduction is done in series (a cascade of 32 AND gates for 65-bit comparator). Experts explain that it is faster using dedicated FPGA carry logic. Yet, my experiment shows that tree (log2 depth) is faster. Logic had to be packed into LUTs for XST not to extract it.

No comments: