Get this course for free with Premium Ai+ subscription

Description

Building applications powered by large language models (LLMs) is a relatively straightforward task. Despite this, creating performant applications that meet high standards of accuracy and reliability can be tricky.  One of the primary challenges faced by LLM applications is mitigating hallucinations.  To address this issue, retrieval-augmented generation (RAG) has become a widely adopted approach, enhancing LLM performance by augmenting its knowledge with additional sources of information such as documents and other auxiliary data sets, which can be represented in one of several different formats.  One of the most common formats is as a vector database.  However, recent work has shown that the accuracy of the RAG can be significantly improved through representing the data in a graph format such as a knowledge graph (KG).

Despite their potential, many KG implementations fail to deliver optimal performance due to difficulties in resolving multiple entities from the source information into a single entity – a technique called entity resolution (ER).  LLMs relying on RAG assume that the data they are accessing is in some way unique and so having different nodes for things like ""My Company Inc."" versus ""My Company, Inc."" can decrease accuracy.  This appears to be a straight-forward case that could be resolved with basic string matching and regex, but as the data gets more complicated, this is no longer possible.  For example, consider more complex variations such as ""Liz Smith,"" ""Elizabeth Conner-Smith,"" and ""Dr. L. Conner-Smith,” which would be very difficult to resolve with simple regex.  

It is possible to use more sophisticated ER techniques that incorporate multiple disparate data sources to obtain more accurate entities.  The end result is the creation of an entity-resolved knowledge graph (ERKG) whereby multiple duplicate entities such as those shown above are collapsed into a single entity while maintaining all information on that entity.  In this way ERKGs enhance both basic graph queries and LLM-driven applications by consolidating and clarifying relationships within the data. This talk will showcase the transformative impact of ER on KGs, using real-world data to highlight improvements in both graph data science tasks and LLM accuracy.  Attendees will gain practical insights into implementing ERKGs, demonstrating the significant advantages of applying ER to KGs in RAG systems.


Instructor's Bio

Dr. Clair Sullivan

Founder and CEO at Clair Sullivan & Associates, LLC

Dr. Clair Sullivan is the Founder and CEO of Clair Sullivan and Associates, specializing in data science and generative AI consulting. She earned a Ph.D. in nuclear engineering from the University of Michigan in 2002 and began her career at Los Alamos National Laboratory, focusing on signal processing for spectroscopic data. After four years in federal government roles, she joined academia as an assistant professor at the University of Illinois, researching machine learning for sensor networks. She later transitioned to industry roles, including machine learning engineer at GitHub, Graph Data Science Advocate at Neo4j, and Director of Data Science at Vail Resorts. Dr. Sullivan has authored numerous publications and received the DARPA Young Faculty Award in 2014 and the ANS Mary J. Oestmann Award in 2015.

Webinar

  • 1

    Talk "Entity-Resolved Knowledge Graphs: Taking your Retrieval-Augmented Generation to the Next Level"

Unlock Premium Features with a Subscription

  • Live Tarining:

    Full access to all live workshops and training sessions.

  • 20+ Expert-Led Workshops:

    Dive deep into AI Agents, RAG, and the latest LLMs

  • ODSC Conference Discounts:

    Receive extra discounts to attend ODSC conferences.