Reduced Complexity Many-Core
Start date: 01.07.2014
Funded by: Universität Augsburg
Local head of project: Prof. Dr. Theo Ungerer, Prof. Dr. Sebastian Altmeyer
Local scientists: Dr. Martin Frieb, Dr. Alexander Stegmeier
Abstract
Avoiding components with a high power consumption is one possibility to reduce the overall power consumption of a processor. Speculative components, like the branch prediction or cache memories are examples for such expensive modules. Speculation is also bad for the predictability of the timing behaviour, since it increases the pessimism. Therefore low power technology and real-time capabilities can be easily united.
Resigning speculation severely reduces the processor performance. The low single thread performance can be compensated by increasing the number of processor cores and concurrent threads. Hence we develop a Reduced Complexity Many Core with small cores and a simple interconnection network that achieves a high throughput by massive parallelism.
Many-Core Timing Analysis
The Scalability of shared-memory multicore processors is limited. Especially at static timing analysis, interferences lead to pessimistic worst-case assumptions and overestimation. For example, worst-case memory access latencies increase with the number of cores. To overcome this, RC/MC prohibits interfering memory accesses and isolates cores. This means there is only core-local memory, no global shared memory. The only way to communicate with other cores is fine-grained message passing (FGMP) over a predictable network-on-chip.
A standard programming model for message-based communication is the message passing interface (MPI). It provides an application independent interface for different standard communication operations (e.g. broadcast, gather, . . . ). Thereby, it uses efficient communication patterns with deterministic behaviour. In applying these known structures, we target to provide a WCET analysis for communication that is reusable for different applications if the communication is executed on the same underlying platform.
PaterNoster Network-on-Chip
State-of-the-art Network on Chips (NoCs) provide a high throughput and low latency by sending packets of data through a mesh topology, using virtual channels and wormhole flow control. The downside of this technology is a high area and energy consumption due to many buffers, large crossbars and a complex arbitration logic within the routers.
The PaterNoster approach simplifies the hardware by sending only small messages that can be transported in one clock cycle. In addition to the area reduction, the simple routing algorithm (XY-routing in a 2D-torus) increases predictability, enabling a tighter timing analyis.