by earl, 5982 days ago
[create Luiz André Barroso] in ACM Queue: "[..] server-class workloads are known to exhibit poor instruction-level parallelism [..]. Our index-serving application, for example, retires on average only one instruction every two CPU cycles on modern processors, badly underutilizing the multiple issue slots and functional units available. This is caused by the use of data structures that are too large for on-chip caches, and a data-dependent control flow that exposes the pipeline to large DRAM latencies."

And as noted elsewhere, Sun's shiny new Niagara ([create UltraSPARC T1]) designs seem to fit well into Barroso's argument.
