We use cookies to improve your browsing experience, save your preferences and provide us with informations on how you use our website. For more information about cookies, please read our

We value your privacy

Manage cookies settings

The Head of AI at Ampere, Victor Jakubiuk, shares insights on the techniques used by Ampere's AI software engineers in optimizing models such as llama.cpp to deliver excellent LLM inference performance on Ampere's Cloud Native Processors. The team behind Ampere Optimized AI Frameworks (AIO) delivered massive performance improvements over the model's vanilla version, making it possible to deploy solely on Ampere CPUs without a need for dedicated accelerators. 

The workshop will include setting up the demo of an AI agent running on a small parameter version of a Llama 3.2 model.

Leveraging AI agents and efficient LLM inference on COP-ARM