Microsoft Foundry — Visual Guide

FOUNDRY ARCHITECTURE

The Stack at a Glance

Foundry is a layered platform: models at the bottom, agents and tools in the middle, knowledge and governance wrapping everything. A single Azure resource provider unifies it all.

FOUNDRY RESOURCE · Microsoft.CognitiveServices

⚖️

Control Plane

Governance, RBAC, networking, policies, observability — wrapped around everything below.

Entra ID Purview Azure Monitor Content Safety

▲

🤖

Agent Service · Workflows

Managed runtime for prompt, workflow, and hosted agents. Handles scaling, identity, and observability.

Agent Framework LangGraph Multi-agent A2A · MCP

▲

🧠

Foundry IQ · Knowledge Layer

Agentic retrieval over enterprise data. Permission-aware, multi-source, reflective. Built on Azure AI Search.

SharePoint OneLake Blob Storage Web

▲

🔧

Foundry Tools · Toolbox

Built-in tools (web search, code interp, file search) plus custom tools and remote MCP servers, all behind a unified MCP endpoint.

MCP OpenAPI Azure Functions Toolbox versions

▲

🧬

Foundry Models · 11,000+

Direct-from-Azure (OpenAI, Anthropic Claude, MAI) and Partners & Community (Meta, Mistral, DeepSeek, Cohere, xAI, HuggingFace).

Serverless API Managed compute Model Router Fine-tuning

▲

💻

Foundry Local

On-device inference for offline, sovereign, or low-latency workloads. Same model catalog, runs on your hardware.

Edge Offline Data sovereignty

01 / FOUNDRY MODELS

The Model Catalog

A unified catalog of more than 11,000 models — flagship frontier models from Microsoft and partners, plus open and specialized models. Browse, benchmark, fine-tune, and deploy through one consistent inference API.

11,000+

TOTAL MODELS

From foundational to industry-specific

15+

PROVIDERS

Microsoft, Anthropic, OpenAI, Meta, & more

60M+

PHI DOWNLOADS

Microsoft's small language models

DEPLOYMENT MODES

Serverless or managed compute

OpenAI

GPT-5 · o3 · o-series

Anthropic

Claude Opus · Sonnet · Haiku

Microsoft

Phi · MAI · industry models

Pay-per-token API

Call models through a managed endpoint without provisioning infrastructure. Single inference API across providers — switch models with a config change, not a rewrite.

No infra to manage

Pay only for what you use

Unified API across providers

Built-in content safety

DEPLOYMENT · MANAGED COMPUTE

Dedicated Capacity

Deploy to dedicated GPU clusters when you need predictable performance, custom fine-tunes, or model formats that don't fit serverless. Provisioned throughput available.

Predictable latency & throughput

Custom fine-tuned models

Bring-your-own-model imports

Network isolation options

⚡ MODEL ROUTER · GA · OPTIMIZE COST & PERFORMANCE AT RUNTIME

User Query "summarize this 50-page contract"

→

Cheap fast model simple Q · 0.2¢

Mid-tier model summary · 1.4¢

Frontier model ✓ complex reasoning · 8.2¢

Reasoning model multi-step · 24¢

02 / AGENT SERVICE

Build, Deploy, Scale Agents

A fully managed runtime for AI agents. Choose your build path — no-code, declarative workflow, or fully custom code — and let Foundry handle hosting, scaling, identity, observability, and enterprise security.

TYPE 01

📝

Prompt Agents

GENERALLY AVAILABLE

Defined entirely through configuration — instructions, model selection, and tools. Build no-code in the Foundry portal in minutes, or via SDK / REST API.

Best for: Rapid prototyping, internal tools, agents that don't need custom orchestration logic.

TYPE 02

🔀

Workflow Agents

PUBLIC PREVIEW

Orchestrate sequences of actions or coordinate multiple agents using declarative definitions. Visual designer or code-first API. Power Fx expressions for control flow.

Best for: Multi-step business processes, multi-agent coordination, deterministic flows.

TYPE 03

🐳

Hosted Agents

PUBLIC PREVIEW

Code-based agents built with Agent Framework, LangGraph, or your own framework, deployed as containers. You write logic; Foundry manages runtime and scaling.

Best for: Complex workflows, custom tool integrations, full control over agent behavior.

EVERY AGENT COMBINES THREE CORE COMPONENTS

🧬 Model

From the Foundry catalog. Provides reasoning and language capabilities.

GPT, Claude, Llama, etc.
Switch via config change
Reasoning models supported

📋 Instructions

Define goals, constraints, and behavior — prompt-based, workflow-defined, or code.

System prompt + persona
Prompt Optimizer (preview)
Task Adherence guardrails

🔧 Tools

Provide access to data and actions — search, files, code, custom APIs, MCP servers.

Built-in: web, code, files
Foundry IQ knowledge bases
Remote MCP & OpenAPI

📊

M365 Copilot & Teams

Publish agents to Microsoft 365 Copilot and Teams via OpenResponses and Activity Protocols. Reach users in the apps they already use.

DISTRIBUTION

🔐

Entra Agent Registry

Centrally publish, discover, and govern agents across the organization via Microsoft Entra identity.

REGISTRY

🔌

Invocations Protocol

Flexible endpoint integration with custom apps and services. Use any framework that speaks A2A or AG-UI.

INTEGRATION

03 / FOUNDRY IQ

The Knowledge Layer

RAG, reimagined as a reasoning task. Foundry IQ treats retrieval as a multi-step plan — query decomposition, source selection, parallel search, and reflection — instead of a one-shot vector lookup. Permission-aware by default.

KNOWLEDGE BASES → AGENTIC RETRIEVAL → AGENTS

Knowledge Sources

📂 SharePoint

🪣 Azure Blob

🏞️ OneLake

🌐 Web

→

IQ Engine

Plan

Retrieve

Reflect

Cite

→

Agents

Grounded answers

Inline citations

ACL-aware

Auditable

Query Plan

An LLM analyzes the question and decomposes it into optimal sub-queries.

Source Select

Routes each sub-query to the right knowledge source — multi-source by default.

Parallel Search

Runs hybrid search (vector + keyword + semantic rerank) across selected sources at once.

Permission Filter

Honors document ACLs and Purview sensitivity labels under caller's Entra identity.

Reflect & Iterate

Evaluates results — re-queries if context is insufficient. Reasoning-style retrieval.

Aggregate & Cite

Returns extractive content with citations so agents can trace answers to source documents.

MICROSOFT'S INTELLIGENCE LAYER — THREE COMPLEMENTARY IQ WORKLOADS

📚

Foundry IQ

Organizational knowledge — documents, files, web content. The general-purpose knowledge base for any agent.

SOURCES · SHAREPOINT · BLOB · ONELAKE · WEB

🏞️

Fabric IQ

Semantic layer for Microsoft Fabric — ontologies, semantic models, and graphs over business data.

SOURCES · ONELAKE · POWER BI · FABRIC SEMANTIC MODELS

💼

Work IQ

Contextual layer for Microsoft 365 — collaboration signals from documents, meetings, chats, workflows.

SOURCES · M365 · TEAMS · OUTLOOK · COPILOT

04 / FOUNDRY TOOLS

Beyond Text Generation

Tools turn LLMs into agents that can act. Foundry provides ready-to-use built-in tools and a unified Toolbox that exposes any custom tool — including remote MCP servers — through a single endpoint to any compatible agent runtime.

CATEGORY · BUILT-IN

Ready Out of the Box

Configured in minutes through the Foundry portal. Some are GA, others in preview — most agents need only basic configuration to start.

🌐Web Search

🐍Code Interpreter

📁File Search

🧠Memory

📊Fabric Data

🔍AI Search

🖥️Computer Use

🛡️Content Safety

CATEGORY · CUSTOM

Bring Your Own Capabilities

Wire in any external API, internal service, or remote MCP server. Foundry handles auth, identity, and routing through unified endpoints.

🔌Remote MCP

⚡Azure Functions

📜OpenAPI Spec

🔗Logic Apps

🛠️Custom MCP

📡Webhook Tools

🧰 TOOLBOX — CURATE ONCE, EXPOSE EVERYWHERE VIA MCP

📝 Prompt Agents

🔀 Workflows

🐳 Hosted Agents

🌍 Any MCP client

🧰 Toolbox

Single MCP-compatible
endpoint · versioned

crm_lookup

create_ticket

query_warehouse

send_notification

05 / CONTROL PLANE

Enterprise Governance

Foundry separates management from development. IT teams configure security and policy at the Foundry resource level; development teams build inside project containers. One unified portal, one resource provider, one set of controls.

🔐

Identity & Access

Unified RBAC across models, agents, tools, and knowledge bases. Microsoft Entra-backed identity end-to-end.

Role-based access control

Managed identities for resources

Caller identity propagation

🌐

Networking

Virtual network injection, private endpoints, public network disable for sensitive workloads.

Private endpoints

Subnet injection for agents

Bring-your-own-VNet

🛡️

Safety & Guardrails

Built-in content safety, prompt injection mitigation (XPIA), and Task Adherence guardrails for agentic workflows.

Azure AI Content Safety

Cross-prompt injection defense

Task Adherence (preview)

📊

Observability

Tracing, monitoring, and evaluation — all under Azure Monitor. LangChain & LangGraph traces supported natively.

Azure Monitor metrics

Run tracing & replay

Eval-driven optimization

💾

Data Residency

Bring your own storage, Azure SQL, Cosmos DB, AI Search. Customer-managed encryption keys supported.

Bring-your-own storage

Customer-managed keys (CMK)

Purview sensitivity labels

📜

Compliance

Inherits Azure's compliance posture — 50+ region-specific certifications. Responsible AI guidance built in.

SOC · ISO · HIPAA · FedRAMP

Responsible AI standards

Transparency reports per model

MANAGEMENT vs. DEVELOPMENT — CLEAR SEPARATION OF SCOPE

SCOPE

OWNED BY

RESPONSIBILITIES

Foundry Resource

IT & Platform Eng

Networking, security policies, model deployment governance, RBAC at resource level, compliance, monitoring config

Project Container

Dev & ML Teams

Build agents, define workflows, register tools, run evaluations, manage project assets and connections

Project Assets

Individual Builders

Files, prompts, evaluation datasets, agent configs, knowledge base configs, tool credentials

06 / FOUNDRY LOCAL

On-Device Inference

Run Foundry models on your own hardware — laptops, edge servers, sovereign clouds. Same model catalog, same APIs, no data leaves your environment. For when latency, privacy, or sovereignty rules out the cloud.

FOUNDRY CLOUD

Managed in Azure

Hosted in Microsoft data centers
Pay-per-token serverless or managed compute
Auto-scaling and global distribution
Full agent service runtime
Latency depends on network

FOUNDRY LOCAL

On Your Hardware

Runs on local CPU, GPU, or NPU
Data never leaves the device or network
Works offline — no internet required
Same model catalog & SDK surface
Sub-millisecond local latency

WHEN TO REACH FOR FOUNDRY LOCAL

🏥

Regulated Industries

Healthcare, defense, finance — where data sovereignty is non-negotiable.

📡

Edge & Offline

Field operations, retail kiosks, manufacturing floors with intermittent connectivity.

⚡

Ultra-Low Latency

Real-time interactions — voice agents, live coding assistants, gaming NPCs.

💰

Predictable Cost

Heavy local workloads where per-token cloud pricing breaks the budget.

🔬

Dev & Prototyping

Iterate on agents and prompts without round-tripping the cloud or burning quota.

🌍

Sovereign Cloud

Deploy in sovereign or air-gapped clouds where Azure isn't an option.

All Six Components at a Glance

Microsoft Foundry is the AI app and agent factory — six natively integrated components, one Azure resource, one portal.

COMPONENT

PURPOSE

KEY CAPABILITIES

WHEN IT MATTERS

🧬 Foundry Models

Discover, deploy, fine-tune from a unified catalog

11K+ models · serverless or managed · model router · benchmarking · fine-tuning

Any AI workload — start of every Foundry project

🤖 Agent Service

Build, deploy, scale single & multi-agent systems

Prompt · workflow · hosted agents · M365 publish · MCP · A2A

Production agents that need scaling, identity, observability

🧠 Foundry IQ

Permission-aware knowledge layer for grounded agents

Agentic retrieval · multi-source · ACL sync · Purview labels · citations

Agents that need to ground responses in enterprise data

🔧 Foundry Tools

Extend agents with built-in & custom capabilities

Web · code · file · MCP · OpenAPI · Toolbox versioning

Agents that need to take actions or call external systems

⚖️ Control Plane

Enterprise governance across all of the above

RBAC · networking · safety · observability · CMK · compliance

Always — production AI without governance is not production

💻 Foundry Local

On-device inference for offline / sovereign workloads

Edge runtime · same model catalog · same SDK · CPU/GPU/NPU

Regulated industries, edge, ultra-low-latency, sovereign clouds

Build, Optimize& Govern

Six Integrated Components