In the upcoming webinar, we delve into the inference benchmarking of prominent open-source Large Language Models such as the 13B and 70B Llama-2. We have used a diverse range of compute shapes available inOracle Cloud Infrastructure (OCI), like Intel, AMD, ARM CPUs, and NVIDIA GPUs. 

A core aspect of our discussion will center on the crucial metrics of Tokens per Second and the corresponding latency, which are pivotal in evaluating the performance of these LLMs. These metrics not only provide insight into the efficiency and speed of model inference but also serve as key indicators for optimization. 

Throughout the webinar, we will guide the audience through a comprehensive journey, covering the various stages of optimization for these LLMs. This includes an in-depth look at the unique challenges and solutions associated with each hardware platform. Our step-by-step process will highlight practical strategies and tweaks that can significantly enhance the performance of these models.

Local ODSC chapter in NYC, USA

Instructor's Bio

Dr. Sanjay Basu 

Senior Director – AI/ML at Oracle Cloud Engineering

Dr. Sanjay Basu is an industry-recognized subject matter expert in Artificial Intelligence, Machine Learning, and Quantum Computing. He has double Master’s in computer science and systems design. His PhD was in Organizational Behaviour and Applied Neuroscience. Currently, he is pursuing his second PhD in AI with focus of research in Retentive Networks. Dr. Basu is also the author and editor of Ethics in AI collection, author of Web 3 books. View his latest blogs here.


  • 1

    ON-DEMAND WEBINAR: "Inference Benchmarking of Prominent Open-Source Large Language Models (LLMs)"

    • Ai+ Training

    • Webinar recording