Generative AI

This article will summarize the learnings from the following sources:

  • An input or query given to a large language model (LLM)
  • Elements of a promt: Instruction + Context + Input/Question
  • Different models may require different promts
  • Ask for structured output (json, html)
  • To get better responses define rules, give advice, ask to not “hallucinate” or give examples

Design and refine promts to optimize LLM response

Zero-shot Prompting
  • Ask without providing any example
Few-Shot Prompting
  • Provide input-output examples
Prompt Chaining
  • Break tasks into subtasks, not all at once
Chain-of-Thought (CoT) Prompting
  • Let’s think step-by-step, like humans
  • Give the steps taken as sub-results
  • Some models can do that built in
  • Pattern that can improve efficiency of large language models by leveraging custom data
  • Do not be solely dependent on static training data. Instead include external & up-to-data knowledge data
  • Reduce risk of hallucinations, include quotes from knowledge data
  • Enables domain specific answers
  • Prevent hallucination by passing context yourself = Factual recall (Details tbd)
  • Analogy: Take an exam with open notes

Example use case: Increase accuracy of chatBots by providing context

  • Prevent hallucination by passing context yourself = Factual recall (Details tbd)
  • Analogy: Take an exam with open notes

Main concepts of RAG Workflow

  • Index context -> Make it searchable
  • Store -> There
  • Retrieval? -> tbd
  • Filtering & Reranking

Chunking breaks down large texts into smaller, manageable parts. 

Why is chunking needed?

  • It helps retrieve relevant information efficiently for content generation.
  • First step in RAG workflow, prepare knowledge data
  • Prevent “Lost in middle”. Be able to apply data governance on knowledge data. Garbage in, garbage out

 

Data/Document Extraction

  • Make documents searchable
  • Split documents into chunks -> Embed chunks with a model -> Store them in vector store
  • Chunk strategies:
    • Context-aware chunking: chunk by sentence/paragraph/section
    • Fixed-size chunking: divide by specific number of tokes
    • Summary + Summary with Metadata
    • Consider Chunk overlap & Windowed summarisation
  • How to chunk is depending on how users will ask questions, is there a query history?
  • Other challenges: Text mixed with image, text-image dependency, irregular placement of text, color in document is context relevant, charts, multi column with hierachical information where order is relevant
  • Embeddings are numerical representations (vectors) of text
  • Embeddings help convert documents into searchable vectors stored
  • Similar texts have embeddings that are close in the vector space, higher similarity scores indicate more relevant documents for the query


Embedding model:

  • Models that transform textual data into numerical vectors
    • Input: Text data  like queries or documents
    • Output: Fixed-size numerical vectors
  • Two types of models:
    • Pre-trained: General-purpose models like OpenAI’s embeddings
    • Fine-tuned: Custom-trained on specific datasets to capture domain-specific nuances (e.g. medical, legal, financial)


Which factors are relevant when choosing an embedding model:

  • Data / Text properties (Vocabulary size, some models handle more diverse words)
  • Technical capabilities of model: maximal chunk length / max token, multi-language support & mainly: context window limitation
  • Training data of model (best fit would be if is similar to knowledge context)
  • Ensure similar embedding space for both queries and documents

TBD