Generative AI

Please note that this article is still under contruction. The next topics to be added are:

This article will fist summarize the learnings from the following sources:

Databricks lab “Generative AI engineering” which can be accessed with the 1-year lab subscription (200 USD)
Article Understanding LLMs from Scratch Using Middle School Math
Guides from promptingguide.ai

An input or query given to a large language model (LLM)
Elements of a promt: Instruction + Context + Input/Question
Different models may require different promts
Ask for structured output (json, html)
To get better responses define rules, give advice, ask to not “hallucinate” or give examples
Advices for better promts by Microsoft

Design and refine promts to optimize LLM response using stategies like:

Technique	Description
Zero-shot Prompting	Ask without providing any example or context Replies completly on model training
Few-Shot Prompting	Provide (input-output) examples Helps the model to understand the context
Prompt Chaining	Break tasks into subtasks / into smaller, sequential prompts Output of one promt will become / can partly be the input for the next
Chain-of-Thought (CoT) Prompting	Let’s think step-by-step, like humans - "think out loud" Give the steps taken as sub-results Some models can do that built in

Definition

Use custom data to enhance model performance and quality of response
Avoid relying solely on static training data; incorporate external, up-to-date knowledge sources
Enable domain-specific and accurate responses
Prevent hallucinations by including quotes or citations from reliable knowledge bases
Ensuring factual recall, meaning the model retrieves and uses precise information from supplied data
Analogy: Take an exam with open notes
Example use case: Increase accuracy of chatBots by providing context
Prevent hallucination by passing context yourself = Factual recall (Details tbd)
Analogy: Take an exam with open notes

Central concepts of RAG Workflow

Indexing: Make data searchable, using vector databases and indexing algorithms
Storage: Organize and maintain context data as embedding in vector database
Retrieval: Fetch the most relevant information using similarity search
Reranking: Refine the results to prioritize the most contextually relevant data
Injection: Combining the retrieved information with the user query before sending it to the LLM

Chunking breaks down large texts into smaller, manageable parts.

Why is chunking needed?

Data/Document Extraction

Make documents searchable
Split documents into chunks -> Embed chunks with a model -> Store them in vector database
Chunk strategies:
- Context-aware chunking: chunk by sentence/paragraph/section
- Fixed-size chunking: divide by specific number of tokes
- Summary with Metadata
- Consider chunk overlap & windowed summarisation
How to chunk is depending on how users will ask questions, is there a query history?
Other challenges: Text mixed with image, text-image dependency, irregular placement of text, color in document is context relevant, charts, multi column with hierachical information where order is relevant

Embeddings are numerical representations (vectors) of text
Embeddings help convert documents into searchable vectors stored
Similar texts have embeddings that are close in the vector space, higher similarity scores indicate more relevant documents for the query

Embedding model:

Models that transform textual data into numerical vectors
Input: Text data like queries or documents
Output: Fixed-size numerical vectors
Two types of models:
- Pre-trained: General-purpose models like OpenAI’s embeddings
- Fine-tuned: Custom-trained on specific datasets to capture domain-specific nuances (e.g. medical, legal, financial)

Which factors are relevant when choosing an embedding model:

Data / Text properties (Vocabulary size, some models handle more diverse words)
Technical capabilities of model: maximal chunk length / max tokens, multi-language support & mainly: context window limitation
Training data of model (best fit would be if is similar to knowledge context)
Ensure similar embedding space for both queries and documents

tbd