Generative AI

Table of Contents
Introduction
Please note that this article is still under contruction. The next topics to be added are:
  • Theory part: vector databases & search
  • Practical example
This article will fist summarize the learnings from the following sources:
  • An input or query given to a large language model (LLM)
  • Elements of a promt: Instruction + Context + Input/Question
  • Different models may require different promts
  • Ask for structured output (json, html)
  • To get better responses define rules, give advice, ask to not “hallucinate” or give examples
  • Advices for better promts by Microsoft

Design and refine promts to optimize LLM response using stategies like:

Technique Description
Zero-shot Prompting
  • Ask without providing any example or context
  • Replies completly on model training
Few-Shot Prompting
  • Provide (input-output) examples
  • Helps the model to understand the context
Prompt Chaining
  • Break tasks into subtasks / into smaller, sequential prompts
  • Output of one promt will become / can partly be the input for the next
Chain-of-Thought (CoT) Prompting
  • Let’s think step-by-step, like humans - "think out loud"
  • Give the steps taken as sub-results
  • Some models can do that built in
Definition
  • Use custom data to enhance model performance and quality of response
  • Avoid relying solely on static training data; incorporate external, up-to-date knowledge sources
  • Enable domain-specific and accurate responses
  • Prevent hallucinations by including quotes or citations from reliable knowledge bases
  • Ensuring factual recall, meaning the model retrieves and uses precise information from supplied data
  • Analogy: Take an exam with open notes
  • Example use case: Increase accuracy of chatBots by providing context
  • Prevent hallucination by passing context yourself = Factual recall (Details tbd)
  • Analogy: Take an exam with open notes
Central concepts of RAG Workflow
  • Indexing: Make data searchable, using vector databases and indexing algorithms
  • Storage: Organize and maintain context data as embedding in vector database
  • Retrieval: Fetch the most relevant information using similarity search
  • Reranking: Refine the results to prioritize the most contextually relevant data
  • Injection: Combining the retrieved information with the user query before sending it to the LLM
Chunking breaks down large texts into smaller, manageable parts.
Why is chunking needed?
  • It helps retrieve relevant information efficiently for content generation
  • First step in RAG workflow, prepare knowledge data
  • Prevent “Lost in middle”. Be able to apply data governance on knowledge data
Data/Document Extraction
  • Make documents searchable
  • Split documents into chunks -> Embed chunks with a model -> Store them in vector database
  • Chunk strategies:
    • Context-aware chunking: chunk by sentence/paragraph/section
    • Fixed-size chunking: divide by specific number of tokes
    • Summary with Metadata
    • Consider chunk overlap & windowed summarisation
  • How to chunk is depending on how users will ask questions, is there a query history?
  • Other challenges: Text mixed with image, text-image dependency, irregular placement of text, color in document is context relevant, charts, multi column with hierachical information where order is relevant
  • Embeddings are numerical representations (vectors) of text
  • Embeddings help convert documents into searchable vectors stored
  • Similar texts have embeddings that are close in the vector space, higher similarity scores indicate more relevant documents for the query

Embedding model:
  • Models that transform textual data into numerical vectors
  • Input: Text data  like queries or documents
  • Output: Fixed-size numerical vectors
  • Two types of models:
    • Pre-trained: General-purpose models like OpenAI’s embeddings
    • Fine-tuned: Custom-trained on specific datasets to capture domain-specific nuances (e.g. medical, legal, financial)

Which factors are relevant when choosing an embedding model:
  • Data / Text properties (Vocabulary size, some models handle more diverse words)
  • Technical capabilities of model: maximal chunk length / max tokens, multi-language support & mainly: context window limitation
  • Training data of model (best fit would be if is similar to knowledge context)
  • Ensure similar embedding space for both queries and documents

tbd