Our SMU-N product is a processor solution to run such large AI models from the device by significantly reducing the data traffic between processor and memory.
Generative AI model such as large language model (LLM) has lots of parameters. In the conventional processor architecture, the trained model is stored in NAND, copied to DRAM and then processed, requiring large amount of data traffic. Moving data in this way results in a significant performance bottleneck ("memory wall problem"), consumes lots of amount of power, and requires large amount of DRAM. SMU-N processor solution so called Compute-near-Flash can significantly reduce data traffic between processor and DRAM with reduced amount of DRAM capacity, and advanced SMU-N+ processor solution integrated in NAND so called Compute-in-Flash can further reduce data traffic between processor and NAND where the trained AI model is stored persistently.