Latest 32-bit RISC architecture for automotive expands functionality

Latest 32-bit RISC architecture for automotive expands functionality

Michael Krämer, Renesas  1/21/2011 12:40 PM EST

During the 15 years since it was launched, Renesas V850 architecture has become a dominant architecture in the automotive electronics area. This Product How-To describes the features, including a SIMD coprocessor, incorporated into the latest variant, the V850E2H.

All V850 products are upwards compatible. As a result, today’s sophisticated components can still execute the same instructions as their forebears. The architecture has undergone continual improvements with extensions to the instruction set, and today it offers computing power of up to 2.6 Dhrystone MIPS/MHz. Further performance increases can be achieved by integrating several of these processor cores on a single chip, delivering twice or even four times more computing power.
The V850 architecture
All variants of the V850 are based on a 32-bit Harvard architecture, meaning that the CPU register and the ALU are 32 bits wide and that two 32-bit buses are provided internally, one for instruction transfers and the other for data access. In an ideal situation, each CPU cycle enables an instruction to be executed while simultaneously providing or writing the data.
As CPU clock rates have increased far faster than memory access times, precautions need to be taken to ensure that the memory does not slow down the CPU. This is why buses for Flash memory are designed 128 bits wide, except for components in the absolute lowest price segment. This enables up to eight instructions to be read simultaneously with one bus access, as the V850 instructions are 16, 32 or 48 bits wide. In addition, instruction caches are also implemented in most of the product derivatives to reduce the number of accesses to the relatively slow Flash memory.
The register set includes 32 32-bit registers. The instruction set is mostly symmetrical, so every instruction can be applied to every register. Special purpose registers like stack pointers, link pointers and parameter transfer registers are assigned by software development tools, not by the architecture. One exception is the r0 register whose content is always zero, as in many RISC architectures.

Extensions to the V850E2H architecture

All the functions mentioned above are also available in the new V850E2H architecture. Major new functions include the SIMD coprocessor (as well as branch prediction), which will be described, briefly, below. (For a detailed description, read a more extensive version of this article here, courtesy of Automotive Designline Europe.)

SIMD architecture
SIMD stands for “single instruction multiple data”

– in other words, processing several operands with a single command.

This unit is therefore particularly well-suited to digital signal processing, which mostly needs simple and basic operations like multiplication and addition but also needs to execute them very frequently and very fast.

The SIMD unit has access to 32 dedicated 64-bit vector registers.

With a single command, it therefore processes 64-bit wide vectors that are divided into two 32-bit or four 16-bit operands.

The SIMD unit has full access to the CPU’s data bus and can therefore read its registers from memory and write the results back there.

This is facilitated by the implementation of addressing methods, such as modulo addressing and automatic address incrementing, that are very useful for filter calculations.

It also supports bit-reverse addressing which is required for fast Fourier transformation (FFT).
The instruction set includes the obligatory multiply-and-accumulate instructions, data type conversions, and the retrieval of maximum and minimum values. Filter and FFT calculation with complex numbers is also supported.
The future: Multicore
To facilitate scalability from the low-cost to the premium segment, Renesas has already announced its first derivative with virtual CPU cores and hardware threading. Although this is basically a single-core CPU, it has several register files. Each clock cycle uses hardware to connect one of these register files to the execution unit. In this way, the individual threads are processed within their own hardware context, so there is no need for a software scheduler.

A scheduling table defines which thread the execution unit uses

– if any – and for how long.

Although this architecture only includes a single execution unit, from the software’s perspective it looks like each thread has its own CPU.

This virtualization ensures the downwards scalability of a multicore system.
Graduate physicist Michael Krämer studied physics at the Darmstadt Technical University. Since 2006 in Renesas Electronics’Automotive Business Unit, he is member of the global CPU Core Working Team.