AI accelerator chip startup Mythic has launched its first product, a 35-TOPS “high-end edge” accelerator whose analog compute-in-memory structure permits low energy consumption and low price, alongside low latency and deterministic behaviour.
The M1108 makes use of Mythic’s analog compute-in-memory method primarily based on 40-nm Flash reminiscence cells. It’s aimed toward edge purposes reminiscent of power-over-ethernet safety cameras that have to run refined AI fashions inside an influence funds. One other seemingly utility is video analytics bins which have to speed up a number of AI fashions on high-resolution footage whereas managing energy and warmth.
Mythic’s AI accelerator chip is able to a considerable 35 TOPS. In contrast with the market chief, Nvidia’s Xavier AGX which clocks in at 32 TOPS, Mythic says it may course of extra frames per second on each ResNet-50 (batch dimension 1) and Yolo v3-608 (batch dimension 1 video feed). (In keeping with figures from respective firms, the Xavier AGX can course of 656 fps on ResNet-50 and 32 fps on Yolo v3, whereas Mythic’s M1108 can do 870 fps and 60 fps on the identical networks).
Mythic’s chip additionally compares favourably with the Xavier AGX in different dimensions. The M1108’s typical energy consumption is simply four W (in comparison with Xavier AGX’s 30 W), and Mythic expects the M1108 to match properly on worth, since its 40-nm silicon doesn’t require a cutting-edge course of node.
Speaking during the CogX festival in June 2019, Mythic CEO Mike Henry mentioned that the corporate was planning to launch samples across the finish of that 12 months. This places right this moment’s M1108 launch virtually a 12 months not on time. In an interview this week with EE Occasions, Henry put this delay right down to the “Herculean” effort behind the scenes to construct a software program stream for the chip.
“We determined to carry off on formally launching [the M1108] till we might present extremely aggressive benchmark numbers,” Henry mentioned. “Constructing the entire software program ecosystem with a versatile compiler on a tile-based structure and getting every thing to work was an enormous effort. We have been displaying off easier [applications], reminiscent of key phrase recognizing, fairly early, however the actually large, highly effective networks stepping into the lots of of frames per second and sub-10 millisecond latency, it’s an amazing quantity of labor on the software program to get to that time.”
The AI accelerator chip market has grow to be more and more aggressive throughout this time. Has the delay price the corporate clients?
Henry says not. Seeing different chips launch affirmed Mythic’s conviction that it has a singular steadiness of efficiency, energy and price, he mentioned. Mythic’s view is that many different “edge” AI chips are actually extra like server class merchandise with 10-15 W energy necessities, or very low-end microwatt units, leaving the tougher “high-end edge” phase comparatively empty of competitors.
Mythic’s AI accelerator chip makes use of 108 compute tiles which depend on analog compute-in-memory methods. In every tile, Mythic’s analog compute engine (ACE) sits alongside a digital SIMD vector engine, a 32-bit RISC-V processor, a community on chip (NoC) router and a few native SRAM. The engine helps the equal of INT4, INT8 and INT16 operations and its total capability is 113 million weights, sufficient to run a number of separate advanced AI fashions concurrently. As computation is carried out inside Flash reminiscence, exterior DRAM just isn’t required.
Dataflow is engineered pre-runtime by the compiler. Since every tile incorporates assets reminiscent of SRAM, there isn’t a competition, so the chip operates in a deterministic means. Parameters reminiscent of latency are identified at compile time.
Henry and Mythic’s senior VP product and enterprise growth Tim Vehling have been at pains to level out that their chip makes use of a “true” compute-in-memory method.
“We do the complete retailer and compute all on the identical reminiscence storage unit, the identical Flash transistor,” mentioned Vehling. “Another analog architectures would possibly do analog computation, however they’re nonetheless utilizing digital reminiscence for parameter storage. So they’re nonetheless performing some type of an information fetch. They’re not going to get the identical density and the identical velocity [as Mythic’s] true compute-in-memory.”
Henry factors to Mythic’s high-performance ADC know-how as one of many key efficiency differentiators versus different analog compute choices. The corporate’s ADC know-how is a key issue permitting Mythic to scale as much as the variety of tiles wanted to hit 35 TOPS.
Except for uncooked efficiency, Mythic’s different key differentiator versus different analog approaches is accuracy.
“We’re hitting excessive sufficient ranges of accuracy that we are able to present fashionable neural networks, like ResNet-50, that many have mentioned would by no means run within the analog area in one million years,” Henry mentioned, including that many analog compute chip firms don’t discuss concerning the accuracy they will obtain on advanced neural networks. “I nonetheless am very skeptical that anyone aside from us goes to be displaying fashionable neural networks operating near INT8 accuracy within the analog area. I feel that’s going to be all the time our distinctive edge,” he mentioned.
Mythic’s high-end edge market consists of goal purposes reminiscent of safety cameras, which Henry described as a “hotly contested” however high-volume market that’s shortly adopting AI accelerators. That is one thing of a candy spot for Mythic since these cameras usually use energy over Ethernet, so energy funds is tight, with compact design and no energetic cooling. The applying calls for real-time processing with low latency, and volumes are excessive.
One other key market is edge bins for video analytics – maybe processing video streams from a complete system of cameras. Energy and price constraints are nonetheless tight, Henry mentioned, and these clients need to run a number of neural networks concurrently.
The M1108’s nomenclature (first era, 108-tile) implies additional generations of product, and/or completely different tile configurations are within the pipeline. The corporate expects a household of merchandise might be constructed utilizing the identical 40-nm compute tile because the M1108. This explicit half was constructed with the safety digital camera and video analytics “candy spot” in thoughts, plus the utmost quantity of house out there on an M.2 card, Vehling mentioned (the M1108 is available in a 19 x 19mm bundle), including that merchandise with each smaller and bigger numbers of tiles are technically potential, however declined to verify whether or not each have been on the corporate’s roadmap. Nevertheless, he did say that multi-chip PCIe playing cards are undoubtedly on the roadmap.
Samples of the M1108 can be found now, alongside single-chip M.2 and PCIe playing cards.