VLIW (very long instruction word) architecture is an incredibly efficient processor design. The ability to instruct multiple execution units in parallel, completely independently, means that utilisation of functional units within the design can be maximised. However, they’re not without their disadvantages. For example, memory is expensive on Application Specific Integrated Circuits (ASICs) and normally takes up the majority of the silicon area on the chip. Program memory is like a limited natural resource on an ASIC, and wasting it is the equivalent of silicon pollution. For VLIW systems program memory is particularly precious because each line of code takes up the same large amount of memory, even if it’s only doing something as simple as an addition on that line.

Whilst it is possible to write very efficient VLIW for the direct instruction of hardware, it is easy to end up with restrictions on the data flow. For example accelerating a function may be dependent on the data coming from I/O and some constants from memory, in order that they can be loaded simultaneously. These constraints on data flow result in the code become tightly coupled to its specific use-case and therefore inflexible. If the same function is needed elsewhere, but this time cannot obey the constraints then another copy is required, but one which is only marginally different from the original. In an ideal world, code to perform a specific function would only be instantiated once, and could be reconfigured and reused multiple times.

Within Cambridge Consultants we have created the Sapphyre technology. Sapphyre is a development platform for the rapid design of custom VLIW cores that can be used for silicon IP blocks, FPGA solutions and custom ASICs. As a result, we regularly have to find ways for our functions to become more common and use program memory in a more silicon-environmentally friendly way. Here are a couple of scenarios in which code has a propensity to become too tightly coupled, and the way that we’ve solved this on our Sapphyre platform:

Memory addressing within a common function

Often memory addressing (say to access data from a buffer) is either fixed addressing or calculated as an offset from a base address. Either case applies constraints on the data flow of the function. Sapphyre cores include indexer hardware: parallel execution units that can be used to calculate sequences of numbers or addresses, independent of the function data flow or register space. They can have a stride to step through memory at different intervals and have modes for both linear and circular buffering. The indexer hardware allows a common function to operate on vastly different memory layouts using the same code, by simply configuring the indexers in advance.

Multiple cores accessing same memory

When a common function has state that exceeds the register space it will need to use memory as temporary storage. Often fixed addresses are used for temporary scratch space, to avoid the need for using indexing methods just for temporary storage. However in multi-core Sapphyre designs two or more cores often have access to the same memory, and if they run shared program code then there is a risk of collision.

In Sapphyre we solve this by extended addressing for private memory. On an ASIC if you attempt to address data memory outside of the physical memory the address would wrap. In Sapphyre cores that wrapping is manipulated differently for each core. This allows us to allocate a Private Memory Space which each core can use to store data in with certainty that it won’t be overwritten by another core. The Sapphyre extended addressing can then be used by common code to allow the same numerical address to be mapped to core-specific Private Memory Space in the same RAM. In this way common code can run on multiple cores and access the same memory without ever having address collisions.

Cores with access to multiple memory ports

Some of the Sapphyre cores we have designed are powerful enough to perform multiple computations each cycle. This needs to be balanced with the associated RAM bandwidth, so those cores have access to multiple RAM ports. The RAM ports are specified in the VLIW instruction but Sapphyre cores include aliasing modes which can redirect the instructions to use different RAM ports. Configuring the aliasing mode external to a common function allows inputs, outputs and state to be flexible between memories without changing the code.

We have found that the key to reusable code is in understanding the interaction between data and the VLIW modules, and building in flexibility to memory addressing at the hardware level. This means that the same code can be used in a multitude of different ways by pre-configuring the hardware. As a result, our common code is even more common, we save program memory and our software is much kinder to the environment.

In the next blog in this Sapphyre  – part 6 – series we will show just how real emulators can be.

Matthew Taylor
Senior DSP engineer