Leveraging AI agents and efficient LLM inference on COP-ARM
The Head of AI at Ampere, Victor Jakubiuk, shares insights on the techniques used by Ampere's AI software engineers in optimizing models such as llama.cpp to deliver excellent LLM inference performance on Ampere's Cloud Native Processors. The team behind Ampere Optimized AI Frameworks (AIO) delivered massive performance improvements over the model's vanilla version, making it possible to deploy solely on Ampere CPUs without a need for dedicated accelerators.
The workshop will include setting up the demo of an AI agent running on a small parameter version of a Llama 3.2 model.
Victor JakubiukHead of AIAmpere ComputingRead the bio