The Evolution of Generative AI

Key Terms in AI and Machine Learning
Generative AI
Autor:in

Jan Kirenz

Veröffentlichungsdatum

18. Oktober 2024

Generative AI is a subset of artificial intelligence that focuses on creating new content based on patterns learned from existing data. Unlike traditional AI systems that are designed to analyze or classify existing information, generative AI has the capability to produce original text, images, audio, and even code.

Generative AI is like a creative apprentice that learns from vast amounts of existing content and then uses that knowledge to produce new, original work.

To fully appreciate generative AI, it’s essential to understand its place within the broader field of artificial intelligence. Let’s break down the progression from AI to machine learning, deep learning, and finally, generative AI.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#333', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart TD
    A[Artificial Intelligence] --> B[Machine Learning]
    B --> C[Deep Learning]
    C --> D[Generative AI]

    A --> |Broad field| E[("Theory and methods to build<br>machines that think and act<br>like humans")]
    B --> |Subset of AI| F[("Ability to learn without<br>explicit programming")]
    C --> |Subset of ML| G[("Uses artificial neural networks<br>to process complex patterns")]
    D --> |Subset of DL| H[("Creates new content based on<br>learned patterns")]

    classDef default fill:#F5F5F7,stroke:#007AFF,stroke-width:2px,color:#333,rx:5,ry:5;
    classDef highlight fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    class D highlight;

The Evolution of AI to Generative AI

Artificial Intelligence

AI is a broad discipline within computer science that deals with creating intelligent agents or systems capable of reasoning, learning, and acting autonomously. It encompasses various approaches to mimicking human-like intelligence in machines.

AI is not a singular technology but rather a multifaceted field that combines various approaches and techniques to create intelligent machines capable of mimicking human-like cognitive abilities.

Machine Learning

Machine Learning (ML) is a subset of AI that focuses on developing algorithms and statistical models that enable computer systems to improve their performance on a specific task through experience. ML systems can learn from and make predictions or decisions based on data, without being explicitly programmed for every scenario.

There are two main types of machine learning models:

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart TD
    A[Machine Learning] --> B[Supervised Learning]
    A --> C[Unsupervised Learning]
    B --> E[Classification]
    B --> F[Regression]
    C --> G[Clustering]
    C --> H[Generative Models]
    
    classDef default fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    classDef main fill:#FF6B6B,stroke:#FF6B6B,stroke-width:2px,color:#FFF,rx:5,ry:5;
    class A main;
    linkStyle default stroke:#000000,stroke-width:2px;

Types of Machine Learning

  1. Supervised Learning (predictive ML models): The model learns from labeled data to make predictions about numbers (regression) or categories (classification). Both the input and the desired output are provided.Example: Predicting customer purchase behavior based on historical data.
  2. Unsupervised Learning: Models identify patterns or structures in unlabeled data like clustering or generative models. Example: Customer segmentation based on purchasing behavior.

Supervised Learning

Supervised learning employs predictive machine learning models, which are valuable for making informed predictions based on past examples.

In the following sections, we will outline how predictive machine learning operates for regression and classification using this process:

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart LR
    D[(Features)] --> PM[<b>Predictive ML model</b><br>Learns relationship<br>between features and label]
    L[(Labels)] --> PM
    PM --> O[<b>Output</b>: Label]


    classDef default fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    classDef cylinder fill:#FFD700,stroke:#FFD700,stroke-width:2px,color:#000,rx:5,ry:5;
    classDef output fill:#FF6B6B,stroke:#FF6B6B,stroke-width:2px,color:#FFF,rx:5,ry:5;
    class D,L cylinder;
    class O,LB output;
    linkStyle default stroke:#000000,stroke-width:2px;

Predictive ML Model Process

In machine learning, a feature is an individual measurable property or characteristic used as input for the model to make predictions. A label is the target variable or the output that the model aims to predict based on the features.

Regression

Imagine you have a friend who’s incredibly good at guessing the price of houses. This friend has looked at thousands of houses, noting their size, location, age, and actual selling prices. Now, whenever they see a new house, they can make a pretty accurate guess about its price.

Predictive machine learning models operate in a similar manner but on a much larger scale and with mathematical accuracy. The variables used to predict the house price are known as features, while the variable we aim to predict (the price) is called the label. This would be an example of a regression model, since we want to predict a numerical label.

  1. Gathering Data:
    • The model starts by collecting information about many houses. This includes:
      • Features: Like size (in square feet or meters), location (neighborhoods or zip codes) and age (year built)
      • Labels: Actual selling prices (this is what the model will try to predict)
  2. Learning Phase:
    • The model analyzes all this data, looking for patterns and relationships.
    • It might discover things like:
      • Larger houses tend to be more expensive
      • Houses in certain neighborhoods command higher prices
      • Newer houses often sell for more than older ones
    • However, it’s not just making simple rules. The model is finding complex relationships between all these factors.
  3. Output:
    • Once the model has learned from all this data, it’s ready to make predictions and provide an output (the label).
    • If you show it a new house it’s never seen before, providing details like:
      • Size: 2,000 square feet
      • Location: Downtown area
      • Age: Built in 2010
    • The model will use what it learned to estimate a selling price (the label) for this house.

While regression models predict continuous numerical values (like house prices), classification models categorize data into discrete classes or categories.

Classification

Let’s consider a similar real estate example, but this time for classification:

Imagine your friend is now excellent at predicting whether a house will sell quickly (within a month) or slowly (more than a month) based on its features. This is a classification task, as we’re categorizing houses into two distinct groups: “Fast Sale” or “Slow Sale”.

Here’s how a classification model might work in this scenario:

  1. Gathering Data:
    • Features: Size, location, age, price, number of bedrooms, etc.
    • Labels: “Fast Sale” or “Slow Sale”
  2. Learning Phase:
    • The model analyzes the data to find patterns that distinguish fast-selling houses from slow-selling ones.
    • It might learn that houses in certain price ranges or with specific features tend to sell faster.
  3. Output:
    • Given a new house’s features, the model predicts which category it belongs to: “Fast Sale” or “Slow Sale”.

The key difference between regression and classification:

  • Regression predicts a continuous value (e.g., exact house price: $250,000).
  • Classification assigns a category or class (e.g., “Fast Sale” or “Slow Sale”).

Both types of models are fundamental in supervised learning, where the goal is to learn from labeled data to make predictions on new, unseen data.

In the context of digital marketing, similar predictive models can be used for various purposes:

  • Predicting which customers are most likely to make a purchase
  • Estimating the best time to send marketing emails
  • Forecasting the success of different ad campaigns

By learning from past data, these models help marketers make more informed decisions about future strategies.

Remember, while Predictive ML models can be very accurate, they’re not perfect. They make educated guesses based on patterns in past data, but unusual circumstances or new trends can still lead to inaccurate predictions.

Model Process

The diagram below illustrates the typical workflow of a supervised learning model in more detail.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#333', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart TD
    A[Input data: x] --> B[Model]
    B --> C[Predict output ŷ]
    D[Actual output y] --> E{Compare}
    C --> E
    E --> F[Error]
    F --> |Model update| B
    
    classDef default fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    classDef compare fill:#FF6B6B,stroke:#FF6B6B,stroke-width:2px,color:#FFF,rx:5,ry:5;
    class E compare;
    linkStyle default stroke:#000000,stroke-width:2px;

Supervised Learning Process

  1. Input Data (x): The process begins with a dataset containing a label and some features. In digital marketing, this could be customer past purchase behavior, demographics and browsing history.

  2. Model: The input data is fed into a machine learning model. This model can be a neural network or any other algorithm capable of learning patterns from data.

  3. Predicted Output (ŷ): The model generates predictions based on the input data. For instance, it might predict the likelihood of a customer making a purchase.

  4. Actual Output (y): This is the ground truth or correct label associated with the input data. In our example, it would be whether the customer actually made a purchase or not.

  5. Comparison: The predicted output is compared to the actual output to assess the model’s accuracy.

  6. Error Calculation: The difference between the prediction and the actual outcome is quantified as an error.

  7. Model Update: Based on the calculated error, the model’s parameters are adjusted to minimize future errors.

This process is iterative, with the model continuously adjusting its parameters to minimize the error and better match the expected outputs.

Unsupervised Learning

Unsupervised learning is a type of machine learning where algorithms learn patterns and structures from input data without explicit labeling or predefined outputs. Unlike supervised learning, where the model is trained on labeled data with known outcomes, unsupervised learning algorithms work with unlabeled data, discovering hidden patterns and relationships autonomously.

Generative AI models, which fall under the umbrella of unsupervised learning, have revolutionized content creation and data analysis. These models learn to generate new, original content that mimics the patterns and structures present in their training data.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart LR
    UC[(Unstructured<br>content)] 
    UC --> GM[Gen AI model<br>Learns patterns in<br>unstructured content]
    GM --> O[Output: New content]

    classDef default fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    classDef cylinder fill:#FFD700,stroke:#FFD700,stroke-width:2px,color:#000,rx:5,ry:5;
    classDef output fill:#FF6B6B,stroke:#FF6B6B,stroke-width:2px,color:#FFF,rx:5,ry:5;
    class UC cylinder;
    class O output;
    linkStyle default stroke:#000000,stroke-width:2px;

Generative AI Model Process

The Generative AI Process

  1. Input Data: The process begins with unstructured content, which can include text, images, or other forms of data. This content serves as the foundation for the model’s learning.

  2. Pattern Learning: The generative AI model analyzes the input data, identifying patterns, structures, and relationships within the content. This step is crucial as it forms the basis for the model’s ability to generate new content.

  3. Content Generation: Using the learned patterns, the model can produce new, original content that shares similarities with the input data but is not a direct copy.

Model Process

Unlike supervised learning, the process in unsupervised learning does not involve features and labels, nor does it include error calculation.

The diagram below illustrates a simplified version of a generative AI process, which is a form of unsupervised learning.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#4CAF50', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#4CAF50', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart TD
    A[Input data: x] --> B[Model]
    B --> C[Generated example]
    
    classDef default fill:#4CAF50,stroke:#4CAF50,stroke-width:2px,color:#FFF,rx:5,ry:5;
    linkStyle default stroke:#000000,stroke-width:2px;

Unsupervised Learning Process

  1. It starts with input data (x) that is fed into the model.

  2. Unlike supervised learning, there is no comparison to predefined expected outputs. Instead, the model learns patterns and structures from the input data and uses this knowledge to generate new, similar examples.

  3. The output is a “Generated example” that the model creates based on its understanding of the input data’s characteristics.

This process allows the model to create new content or data points that are similar to, but not exact copies of, the training data.

ML Equation

The image below illustrates the core idea behind machine learning models through a simple equation: \(y = f(x)\).

Model output: y
Input data: x
y = f(x)
Model: f()

Let’s break down what this means:

  • Input data (x): This represents the information fed into the model. In digital marketing contexts, this could be customer demographics, browsing history, or engagement metrics.

  • Model f(): The model is represented by the function f(). It’s the “brain” of the system that learns patterns from the input data. This could be a neural network, a decision tree, or any other machine learning algorithm.

  • Model output (y): This is the result produced by the model after processing the input data. Depending on the task, this could be a numerical prediction, a classification, or a generated output.

The equation \(y = f(x)\) is applicable to both supervised and unsupervised learning. It represents the general concept of a model taking input and producing output.

Let’s think through this step-by-step to determine if the equation y = f(x) applies to both supervised and unsupervised learning:

  1. Understanding the equation:
    • \(y = f(x)\) represents a function that takes input x and produces output y.
    • \(f()\) represents the model or algorithm.
  2. Supervised Learning:
    • In supervised learning, we have labeled data.
    • x represents the input features.
    • y represents the known target or label.
    • The model \(f()\) learns to map x to y during training.
    • After training, \(f(x)\) predicts y for new, unseen x.
    • Clearly, y = f(x) applies here.
  3. Unsupervised Learning:
    • In unsupervised learning, we don’t have labeled data.
    • x still represents the input features.
    • There’s no predefined y to predict.
    • However, the model \(f()\) still processes the input x.
    • The output of f(x) could be:
      1. Cluster assignments (in clustering algorithms)
      2. Tgenerated data (in generative)
    • We can still consider these outputs as y, even though they’re not predefined labels.
  4. Key difference:
    • In supervised learning, y is known during training and predicted afterwards.
    • In unsupervised learning, y is generated or discovered by the algorithm.
  5. Generalization:
    • We can generalize \(y = f(x)\) to mean “output = model(input)” for both types of learning.
    • The nature and interpretation of y may differ, but the general form holds.

The key difference lies in the nature and interpretation of y, not in the overall structure of the equation. Therefore, the formulation helps to unify our understanding of different machine learning paradigms under a common framework.

Deep Learning

Deep Learning (DL) is a specialized subset of machine learning that uses artificial neural networks with multiple layers (hence “deep”) to model and process complex patterns in data. Deep learning has been particularly successful in areas such as image and speech recognition, natural language processing, and generative tasks.

Neural networks are the building blocks of Deep Learning systems. They consist of layers of interconnected nodes that process information in a way similar to neurons in the human brain.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#FFFFFF', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart LR
    A[Input Layer] --> B[Hidden Layer 1]
    B --> C[Hidden Layer 2]
    C --> D[Output Layer]
    
    classDef default fill:#007AFF,stroke:#007AFF,stroke-width:2px,color:#FFF,rx:5,ry:5;
    linkStyle default stroke:#000000,stroke-width:2px;

Basic Structure of a Neural Network

  1. Input Layer: Receives initial data (e.g., customer demographics, past purchase history)
  2. Hidden Layers: Process and transform the data through complex computations
  3. Output Layer: Produces the final result (e.g., product recommendations, customer segmentation)

Generative AI

Generative AI represents the cutting edge of deep learning techniques. It focuses on creating new, original content rather than just analyzing or classifying existing data.

Generative models learn the underlying patterns and structures of their training data and can then produce new, similar content.

Next, let’s take a look at how we can use generative models to produce new output:

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'Helvetica, Arial, sans-serif', 'fontSize': '14px', 'primaryColor': '#007AFF', 'primaryTextColor': '#333', 'primaryBorderColor': '#007AFF', 'lineColor': '#000000', 'secondaryColor': '#F5F5F7', 'tertiaryColor': '#FFFFFF'}}}%%

flowchart LR
    subgraph SKILLS["Input: Prompt"]
        S["Text • Code<br>Image • Video<br>Speech"]
    end
    subgraph TASKS["Model"]
        T["Gemini<br>GPT-4<br>Claude"]
    end
    subgraph WORK["Output: New Content"]
        W["Text • Code<br>Image • Video<br>Speech"]
    end
    SKILLS --> TASKS
    TASKS --> WORK

    classDef default fill:#F5F5F7,stroke:#007AFF,stroke-width:0px,rx:10,ry:10;
    classDef title font-weight:bold,font-size:16px,fill:#F5F5F7,stroke:#000000,stroke-width:1px,rx:10,ry:10;
    class SKILLS,TASKS,WORK title;
    linkStyle default stroke:#000000,stroke-width:1px;

Simplified Overview of How Generative AI Works

The diagram presents a streamlined view of generative AI’s functionality. At its core, the process involves three main components:

  • Input (Prompt): This is the initial data provided to the AI, which can take various forms such as text, code, images, video, or speech.

Types of prompts:

  1. 📝 Text-based prompts: Written instructions or questions
  2. 🏞️ Image prompts: Visual inputs for image-based tasks
  3. 👩‍💻 Code snippets: For programming-related generations
  4. 💬 Audio input: Provide instructions or questions with speech
  5. Multimodal prompts: Combining different types of inputs
  • AI Model: This is the heart of the system, represented by advanced language models like Gemini, GPT-4, or Claude. These models process the input and generate the output.

  • Output (Generated Content): This is the new content created by the AI model in response to the input. The output can be in the same formats as the input: text, code, images, video, or speech.

Common types of generative AI outputs:

  • 📝 Text: Articles, stories, scripts, marketing copy
  • 🏞️ Images: Artwork, designs, photorealistic images
  • 👩‍💻 Code: Programming scripts, software modules
  • 💬 Audio: Music, voice recordings, sound effects
  • 🎥 Video: Animations, short clips, visual effects

The AI model acts as a bridge between the input and output, transforming the initial prompt into novel content. This process showcases the AI’s ability to understand and generate diverse types of data, highlighting the versatility and power of generative AI systems.

Generative Models

Generative AI models are the core engines that process inputs and generate outputs. These models are built on complex neural network architectures and are trained on vast datasets.

Popular generative AI models:

Model Company Specialization Key Features
GPT-4 OpenAI Text and image Multimodal (text + image input)
Claude Anthropic Text generation High intelligence, complex task handling
Gemini Google Multimodal AI Text, image, and video understanding

These models continuously evolve, with new versions and capabilities being released regularly.

Generative AI encompasses a variety of model types, each designed for specific tasks or types of content generation.

Let’s explore some of the most common types.

Text-to-Text Models

Text-to-text models take natural language input and produce text output. These models are versatile and can be used for a wide range of tasks, including:

  • Language translation
  • Text summarization
  • Question answering
  • Content generation

Text-to-Image Models

Text-to-image models generate images based on textual descriptions. These models have gained significant attention due to their ability to create highly detailed and creative visuals from simple text prompts.

Text-to-Video Models

Text-to-video models aim to generate video content based on textual descriptions. While still in earlier stages compared to text-to-image models, they show promise for creating short video clips or animations from text input.

Text-to-3D Models

These models generate three-dimensional objects or scenes based on text descriptions. They have potential applications in gaming, virtual reality, and product design.

Text-to-Task Models

Text-to-task models are designed to perform specific actions or tasks based on natural language instructions. These can include:

  • Answering questions
  • Performing searches
  • Making predictions
  • Executing commands in software interfaces

The quality and relevance of generative AI outputs heavily depend on the clarity and specificity of the input prompts, as well as the capabilities and training of the underlying model.

Conclusion

As we’ve explored in this article, generative AI represents a significant leap forward in the field of artificial intelligence. From its roots in broad AI concepts to the specialized realm of deep learning, generative AI has emerged as a powerful tool for creating original content across various mediums - text, images, code, and even video.

Key takeaways include:

  1. The evolution from AI to machine learning, deep learning, and finally to generative AI, each building upon the capabilities of its predecessors.

  2. The distinction between supervised and unsupervised learning, with generative AI falling under the latter category.

  3. The fundamental equation y = f(x) that underpins both supervised and unsupervised learning models.

  4. The structure and function of neural networks as the building blocks of deep learning systems.

  5. The simplified overview of how generative AI works, from input prompts through advanced models to generate new content.

  6. Various types of generative models, including text-to-text, text-to-image, and emerging technologies like text-to-video and text-to-3D models.

As generative AI continues to advance, we can expect to see even more sophisticated applications and integrations into various industries. However, it’s crucial to remember that while these models are incredibly powerful, they also come with challenges. Issues such as bias in training data, the potential for generating misleading information, and ethical concerns about content creation and ownership will need to be addressed as the technology matures.