… and it looks nothing like today’s large ASICs
The current state of the art
For years, large ASICs like the ones used in network processing, supercomputing and high-end personal computing have had very interesting similarities. The figure below is a fairly typical floorplan of such an ASIC. After taping out over a dozen of these types of chips a year, it is interesting to see that the interfaces have changed, processors are faster and memory data rates have increased, but the basic floorplan remains similar.
Conventional networking ASICs are memory hungry and have embedded memory such as 1T memories as well as large interfaces to external memory. These large memory interfaces have very significant effects on the die size due to the periphery needed to fit these interfaces. Access to more external memory is generally limited to the most advanced JEDEC standard and the width of the IP available to support them. Alas, although these external memories may have high data rates and ever-growing capacity, the two drawbacks are the high power needed to drive these nets and the costs of interfacing to these memories. Yes, I did imply that existing memory solutions may be more expensive than a 2.5D implementation. Once you consider that the package size is usually driven up by these high-pin-count memory interfaces and that the PCB layer count is also driven by the memory interface on the PCB, then it is plausible that at the system-level, it is less expensive to go down the 2.5D route. If not for these massive memory interfaces, large networking systems that are in the 32-PCB-layer range may only need six PCB layers.
Embedded memories can be used to address the needs of low-latency memory in an ASIC. There are several choices, but the most common embedded memory is 1T memory. In a large ASIC, this can take up the lion’s share of the area along with the processors, but will not come close to the capacity of an external memory. The largest 1T memories in use are in the ballpark of just above 100Mb. This is several orders of magnitude less than the external memory, but it serves the purpose of having very low-latency memory.
As nodes shrink, the maximum SerDes data rates increase as well, yet they occupy a similar amount of space on a die. I would have thought that we would end up using less SerDes as data rates increased, but that guess was certainly wrong. Higher-data-rate SerDes in the same package with very large memories is a challenge, but not for the reasons many would assume. It is routing these hundreds of higher-speed SerDes (10Gbps-28Gbps) in large packages that is tricky because the transmission lines are longer on lossy dielectrics. While we can engineer the most elegant transition from bump-to-trace-via-ball, etc., we still need to work within the bounds of the materials available to make a robust large package. The properties that make dielectrics mechanically robust generally make them have higher loss tangents, hence they do not operate as well at these higher frequencies.
The future as I see it
While we have covered some of the difficulties associated with the current architecture — which is a monolithic solution as seen above, the future as seen below is completely different. This future 2.5D solution may appear to be a more complex solution, but its elegance lies in the simplicity it brings to the architecture. The custom ASIC seen in the image below has not only partitioned out some aspects out of the monolithic ASIC, but it has also brought in memory that would have otherwise been outside of the package.
The image above shows what a new ASIC architecture may look like. There are several key benefits to this implementation:
- The package is not shown, but it would be much smaller than it would have been with a monolithic die. The reason is that the memory interface has been removed from the package, as it is already inside the package.
- The die itself can be considerably smaller if the embedded memory can be mostly on the adjacent memory stack.
- The SerDes can be removed and replaced with a tile that takes a highly parallel interface and multiplexes it to several high-speed channels.
- If a processor is needed, it would be best located on top of the ASIC because these processor architectures work best when vertically stacked, rather than placed side by side.
Once these portions of the die are removed, there is little need to use the latest wafer node. The final benefit is a lower barrier to entry for a new device because this ASIC can likely be done in a legacy node that has a much lower foundry NRE. Assuming that the other tiles mounted on the interposer already exist, it is possible that the unit system cost of this 2.5D implementation is lower than that of the monolithic die solution.
When the performance and cost both benefit from an implementation, it is no longer a matter of whether a solution will come to bear, but when. The future has become clearer. The challenges of the monolithic die architecture and the emerging packaging capabilities on tying these partitioned elements together may provide the environment needed to propel our industry into another dimension in system architecture.