Tech

Generative AI

This article will summarize the learnings from the following sources:

  • An input or query given to a large language model (LLM)
  • Elements of a promt: Instruction + Context + Input/Question
  • Different models may require different promts
  • Ask for structured output (json, html)
  • To get better responses define rules, give advice, ask to not “hallucinate” or give examples

Design and refine promts to optimize LLM response

Zero-shot Prompting
  • Ask without providing any example
Few-Shot Prompting
  • Provide input-output examples
Prompt Chaining
  • Break tasks into subtasks, not all at once
Chain-of-Thought (CoT) Prompting
  • Let’s think step-by-step, like humans
  • Give the steps taken as sub-results
  • Some models can do that built in
  • Goal:
    • Pattern that can improve efficiency of large language models by leveraging custom data
    • Do not be solely dependent on static training data. Instead include external & up-to-data knowledge data
    • Reduce risk of hallucinations, include quotes from knowledge data
    • Enables domain specific answers
    • Prevent hallucination by passing context yourself = Factual recall (Details tbd)
    • Analogy: Take an exam with open notes
  • Example use case: Increase accuracy of chatBots by providing context
  • Main concepts of RAG Workflow
    • 1) Index context -> Make it searchable
    • 2) Store -> There are specialised vector databases
    • 3) Retrieval? -> tbd
    • 4) Filtering & Reranking

Chunking breaks down large texts into smaller, manageable parts. 

Why is chunking needed?

  • It helps retrieve relevant information efficiently for content generation.
  • First step in RAG workflow, prepare knowledge data
  • Prevent “Lost in middle”. Be able to apply data governance on knowledge data. Garbage in, garbage out

 

Data/Document Extraction

  • Make documents searchable
  • Split documents into chunks -> Embed chunks with a model -> Store them in vector store
  • Chunk strategies:
    • Context-aware chunking: chunk by sentence/paragraph/section
    • Fixed-size chunking: divide by specific number of tokes
    • Summary + Summary with Metadata
    • Consider Chunk overlap & Windowed summarisation
  • How to chunk is depending on how users will ask questions, is there a query history?
  • Other challenges: Text mixed with image, text-image dependency, irregular placement of text, color in document is context relevant, charts, multi column with hierachical information where order is relevant