Skip to main content

Overview

Loop over datasource files is an agent behavior that runs one agent execution per file in a knowledge store. Two modes exist — choose based on whether your agent needs to identify and route files, or read and process them.
Metadata loopContent loop
Full nameLoop through knowledge metadataLoop over datasource file content
Agent receivesFile IDs only — no textFull file text + metadata
Tools requiredYes — must add at least oneNo — content arrives automatically
Variables injectedfileId, BinderID, PathNone needed
Best forRouting, orchestration, permission checksSummarization, extraction, classification
CostLowerHigher

Glossary

  • Knowledge store / datasource: Container holding processed files available to your agent.
  • Connector: Integration that links your system to the knowledge store (e.g., Drive, SharePoint, S3-like source).
  • Processed file: A file that has been ingested and is available for iteration.
  • Binder: A document container with a folder/path structure, accessible via Binder tools.
  • Vector store: Storage holding chunked embeddings and retrievable text for semantic search and content reconstruction.

Which mode should I use?

Use the metadata loop if you answer “yes” to any of these:
  • Do I only need file identifiers to trigger the next step?
  • Am I retrieving only specific chunks or excerpts — not the whole file?
  • Do I need to check permissions, ownership, or external metadata first?
  • Am I routing, auditing, indexing, or creating downstream jobs per file?
Use the content loop if you answer “yes” to any of these:
  • Does the agent need to read the entire document?
  • Is the primary goal summarization, extraction, or classification from text?
  • Do I want the simplest possible pipeline with no retrieval configuration?


Mode 1 — Metadata loop (Loop through knowledge metadata)

Iterates every processed file and runs the agent once per file, injecting only identifiers. The agent never sees file content. Use this when your pipeline needs to route, filter, or call external systems based on a file’s identity — not its text.

What the agent receives

VariableDescriptionExample value
{{Variables.fileId}}UUID of the current file"a1b2c3d4-e5f6-7890-abcd-ef1234567890"
{{Variables.BinderID}}Current binder identifier"binder-12345"
{{Variables.Path}}Current file path in binder"/documents/contracts/contract.pdf"
The agent also receives the user’s original prompt. No file text is passed.

Quick start

┌─────────────────────────────────────────────────────────────┐
│                    USER SETUP WORKFLOW                      │
└─────────────────────────────────────────────────────────────┘

Step 1: Enable Behavior

   │ ✓ Click "Loop through knowledge metadata" in Agent Studio
   │ ✓ Select datasource (auto-selected: UserPersonalFiles)
   │ ✓ See file count (e.g., "25 files")


Step 2: Input Variables AUTOMATICALLY Added ✨

   │ ✓ Variables.fileId      (system adds this)
   │ ✓ Variables.BinderID    (system adds this)
   │ ✓ Variables.Path        (system adds this)

   │ → User does NOTHING — this is automatic!


Step 3: Add Compatible Tools (Manual — required)

   │ ✓ Add "File Content Retrieval" to model, OR
   │ ✓ Add "Binder Content Retrieval" to model, OR
   │ ✓ Add "List artifacts in Binder" to model, OR
   │ ✓ Add "List folders in Binder" to model

   │ → Agent builder must manually add at least one tool


Step 4: AI Uses Auto-Populated Variables

   │ During Execution (Automatic):

   │ For File 1:
   │   System sets: Variables.fileId = "guid-1"
   │   AI calls: File Content Retrieval(fileId = Variables.fileId)

   │ For File 2:
   │   System sets: Variables.fileId = "guid-2"
   │   AI calls: File Content Retrieval(fileId = Variables.fileId)

   │ ... continues for all files ...


✓ Complete! Agent processes all files in datasource.

How to set up in Agent Studio

Step 1: Enable the behavior

  1. Navigate to Agent Studio
  2. Go to the Behavior section
  3. Click to enable “Loop through knowledge metadata”
  4. Click Add Files in the Playground or Catalog chat and select the behavior
  5. A configuration dialog will appear showing:
    • User’s personal datasource (auto-selected)
    • Datasource name (e.g., UserPersonalFiles)
    • Total file count (e.g., 25 files)
  6. Click “Use datasource” to confirm

Step 2: Variables are automatically added ✨

No action needed. The system automatically injects these variables into each iteration:
VariableAccess patternExample value
{{Variables.fileId}}Use in tool parameters or prompts"a1b2c3d4-e5f6-7890-abcd-ef1234567890"
{{Variables.BinderID}}Use for Binder tools"binder-12345"
{{Variables.Path}}Use for Binder tools"/documents/contracts/contract.pdf"

Step 3: Add a compatible tool (required)

Option A — Add to Model step (recommended)
  1. Select your Model step (AI Operation step)
  2. In the Tools section, click “Add Tool”
  3. Add one of the supported tools (must be added to the project first):
    • File Content Retrieval (recommended — simplest)
    • Binder Content Retrieval
    • List artifacts in Binder
    • List folders in Binder
Option B — Add as Tool step on canvas
  1. Add a Tool Action step to your canvas
  2. Select one of the supported tools
  3. Set parameters to AI determine — the system will pass the auto-populated variables

Step 4: AI uses the variables automatically

For each file, the system sets the variables and the AI calls the tool:
Variables.fileId = "current-file-guid"
Variables.BinderID = "current-binder-id"
Variables.Path = "current-file-path"

→ File Content Retrieval(fileId = Variables.fileId)
→ Returns content of current file

Setup checklist

  • ✅ Enable “Loop through knowledge metadata” behavior
  • ✅ Variables auto-added (fileId, BinderID, Path)
  • ✅ Add at least one compatible tool to model or canvas
  • ✅ AI uses populated variables automatically per file

Typical uses and examples

Targeted per-file vector search Loop injects fileId and a search step filters retrieval to that document. Agent summarizes only the retrieved excerpts — not the whole file. Enrichment via external system of record Use fileId and BinderID to fetch authoritative metadata (owner, classification, legal hold status). Agent outputs a decision (retain, restrict, review) without loading content. Routing and orchestration Retrieve file details by fileId and route to translation, OCR, or contract extraction pipelines based on type or tags. Permission-aware processing Query access controls per file using fileId. Produce a report of mismatches or risky exposures without touching raw text.

Strengths and tradeoffs

Strengths
  • Lower token usage and cost — no content loaded by default
  • Fine-grained control over when and how content is retrieved
  • Safer default for compliance-sensitive environments
Tradeoffs
  • Requires extra retrieval steps when content is eventually needed
  • Cannot summarize or extract without additional configuration


Mode 2 — Content loop (Loop over datasource file content)

Iterates each processed file and runs the agent once per file, passing the full reconstructed text (chunks joined from the vector store) along with metadata. No tools or variables are needed — the agent receives everything it needs to read and act on each document directly.

What the agent receives

  • The user’s original prompt
  • File metadata: name, fileId, BinderID, Path
  • Full reconstructed text of the current file
No variables need to be referenced. No tools need to be called to access the content.

Quick start

┌─────────────────────────────────────────────────────────────┐
│                    USER SETUP WORKFLOW                      │
└─────────────────────────────────────────────────────────────┘

Step 1: Enable Behavior

   │ ✓ Click "Loop over datasource file content" in Agent Studio
   │ ✓ Select datasource (auto-selected: UserPersonalFiles)
   │ ✓ See file count (e.g., "25 files")


Step 2: No Variables Needed ✨

   │ ✓ No input variables required
   │ ✓ Full file text is delivered directly to the agent
   │ ✓ File metadata is included automatically

   │ → User does NOTHING — content is injected automatically!


Step 3: No Tools Required

   │ ✓ Content is already included in the agent's input
   │ ✓ No retrieval tools need to be added
   │ ✓ Agent can read and act on each file immediately

   │ → No manual tool configuration needed


Step 4: AI Reads Full File Content Directly

   │ During Execution (Automatic):

   │ For File 1:
   │   System injects: full text of file 1 + metadata
   │   AI reads content directly and processes it

   │ For File 2:
   │   System injects: full text of file 2 + metadata
   │   AI reads content directly and processes it

   │ ... continues for all files ...


✓ Complete! Agent processes all files in datasource.

How to set up in Agent Studio

Step 1: Enable the behavior

  1. Navigate to Agent Studio
  2. Go to the Behavior section
  3. Click to enable “Loop over datasource file content”
  4. Click Add Files in the Playground or Catalog chat and select the behavior
  5. A configuration dialog will appear showing:
    • User’s personal datasource (auto-selected)
    • Datasource name (e.g., UserPersonalFiles)
    • Total file count (e.g., 25 files)
  6. Click “Use datasource” to confirm

Step 2: Nothing else required ✨

The system automatically fetches and reconstructs each file’s full text from the vector store and delivers it to the agent as part of the input payload. No variables to configure, no tools to add.

Setup checklist

  • ✅ Enable “Loop over datasource file content” behavior
  • ✅ That’s it — the agent receives full text automatically per file

Typical uses and examples

Per-file executive summary Each run receives the entire document. Output a 5-bullet brief and a 1-sentence takeaway. Structured extraction Extract scope, key rules, exceptions, enforcement owner, and effective date from each policy document. Document classification and tagging Use full content signals to label each file by department, document type, and sensitivity level. Quality checks / compliance review Detect missing clauses (termination, liability cap, confidentiality) and flag risk with rationale.

Strengths and tradeoffs

Strengths
  • Simplest pipeline for “read then act” workflows — no retrieval configuration needed
  • Predictable: one agent run equals one file’s complete context
Tradeoffs
  • Higher token usage and cost (full text per run)
  • More latency per file (content reconstruction + larger input)
  • Increases exposure of raw text to the agent runtime — use only when full-text processing is required


Supported Tools (Metadata loop only)

These tools are only required for the metadata loop. The content loop does not need them.
  • Tool type: FileContentRetrieval
  • Purpose: Retrieve full file content and metadata by file ID
  • API endpoint: /v1/DataVectorSearch/file/<fileId>/content
  • Parameters: fileId (string, required) — GUID of the file to retrieve
  • Returns: Full file content, metadata, source details

2. Binder Content Retrieval

  • Tool type: BinderContentRetrieval
  • Purpose: Retrieve artifact content from a document Binder
  • API endpoint: /index-search/v1/Binder/<BinderID>/content?path=<Path>
  • Parameters: BinderID (string, required), Path (string, required)
  • Returns: Full content of the specified artifact

3. List artifacts in Binder

  • Tool type: ListBinderArtifacts
  • Purpose: Get all artifacts (files) at a specific path in the Binder
  • API endpoint: /index-search/v1/Binder/<BinderID>/files?path=<Path>
  • Parameters: BinderID (string, required), Path (string, required)
  • Returns: Array of file metadata (names, paths, types)

4. List folders in Binder

  • Tool type: ListBinderFolders
  • Purpose: Get all folders at a specific path in the Binder
  • API endpoint: /index-search/v1/Binder/<BinderID>/folders?path=<Path>
  • Parameters: BinderID (string, required), Path (string, required)
  • Returns: Array of folder names and paths

Practical Guidance and Pitfalls

Cost and performance Large files at scale make the content loop expensive. A common pattern: run the metadata loop first to filter and route, then trigger the content loop only on the files that need deep processing. Consistency of outputs In the content loop, enforce a consistent output template (summary, key points, actions) to keep results comparable across files. Error handling Expect per-file failures (permissions, missing content, timeouts). Make outputs idempotent to avoid duplicates on reruns. Data governance The content loop sends raw file text to the agent runtime. Use it only when full-text processing is genuinely required. The metadata loop is the safer default for compliance-sensitive environments.