All blogs
AI & Automation

How to Build an AI Agent for Your Business in 2026

The architecture, stack choices, and design decisions for production AI agents — from a team that ships them

Maitreya KulkarniFounder, Nexolve Technologies
11 min read
How to Build AI AgentAI Agent DevelopmentAI Agent ArchitectureProduction AI AgentLLM Tool CallingAI Agent India

In 2026, "AI agent" is everywhere — and most articles describe what they are without showing how to build one. This guide is the practical one: the actual architecture, the actual stack, and the actual design decisions we make when building production AI agents for our clients.

If you're not sure whether your problem needs an agent or just an LLM-powered feature, start with our AI automation use cases for Indian SMBs. For the conceptual foundation on what makes a system "agentic", see Agentic AI Systems.

What Makes a System an Agent

A system is an "agent" when it can:

  1. Take a high-level goal rather than a step-by-step instruction
  2. Decide which actions to take (which tools to call, in which order)
  3. Iterate based on results from those actions
  4. Stop when the goal is achieved or when escalation is needed

A chatbot that just answers questions is not an agent. A workflow that calls APIs in a fixed sequence is not an agent. A system that takes "reconcile this customer's billing issue" and figures out which APIs to call, in what order, with what inputs, while updating its plan as it learns more — that's an agent.

The Reference Architecture

Every production agent we build has the same five components:

1. The Orchestrator (Control Loop)

The orchestrator runs the main loop:

Receive goal → Plan → Execute one step → Observe result → Decide: continue, replan, or stop → Repeat

In 2026, the dominant orchestration patterns are LangGraph (for graph-shaped workflows), Anthropic's tool-use loop (for simpler agentic loops), and custom state machines (for production-critical workflows where reliability matters more than flexibility).

For most production B2B agents, we recommend Anthropic's tool-use loop or a custom state machine. LangGraph is excellent for prototyping but adds operational complexity that many SMB-scale deployments don't need.

2. The LLM (Reasoning Engine)

The LLM does the actual reasoning: which tool to call, what arguments, how to interpret results. Choice of LLM is consequential.

For agents that take actions on real systems (sending emails, updating CRM records, processing payments), use a top-tier model. Claude 4.5/4.6/4.7 has become the default for agentic workloads in 2026, with high tool-calling reliability and the longest stable context windows. GPT-5 is competitive for some workloads. We've broken down the choice in ChatGPT vs Claude vs Custom LLM.

For internal-only agents handling lower-stakes tasks, smaller open models (DeepSeek V4, Qwen 3, Llama 4) can work — but expect more babysitting and higher tool-call failure rates.

3. Tools (Actions the Agent Can Take)

Tools are the bridge between the LLM and the real world. Each tool is:

  • A function with a clear name and description
  • Typed inputs and outputs (JSON schema)
  • An implementation that does the actual work (API call, database query, file operation)

Examples for a customer-support agent:

  • get_customer_orders(customer_id)
  • search_knowledge_base(query)
  • create_refund(order_id, amount, reason)
  • escalate_to_human(reason, conversation_summary)

The Model Context Protocol (MCP) has become the standard for tool exposure — it lets you write a tool once and use it across different LLMs and agent frameworks. For deeper context, see our MCP guide.

The single biggest design decision is which tools to give the agent. Too few, and the agent can't accomplish goals. Too many, and the agent gets confused, picks wrong tools, and slows down. Aim for 5–15 well-named tools per agent for production deployments.

4. Memory (Context Across Turns)

Agents need memory in three forms:

  • Short-term: The current conversation (handled by the LLM's context window)
  • Working memory: State accumulated during a single goal execution (e.g., "I called X and got Y, then called Z")
  • Long-term: Knowledge that persists across sessions (customer profiles, learned preferences, historical outcomes)

For most production agents, short-term memory in the LLM context + working memory in a structured state object is sufficient. Long-term memory, when needed, is typically a vector database (Pinecone, Weaviate, or pgvector for smaller scale) with retrieval at agent startup.

5. Guardrails (Safety + Reliability)

Production agents need guardrails:

  • Confidence scoring — if the LLM isn't confident, escalate to a human
  • Action validation — actions over a threshold (refunds > ₹5,000, deletions, irreversible operations) require human approval
  • Rate limiting — agent can't loop forever; cap iterations and escalate if not converging
  • Audit logging — every tool call, every input, every decision logged for review
  • Cost caps — hard limit on LLM tokens per goal to prevent runaway costs

Skipping guardrails is the most common reason production agents fail. The agent works fine in testing, then on day three of production it spends ₹40,000 in API costs trying to figure out an edge case it should have escalated.

A Concrete Example: Customer Support Agent

Let's design one end-to-end. The goal: handle Tier 1 support for an Indian D2C brand on WhatsApp.

Tools the agent gets:

  • get_order_status(order_id) — pulls from Shopify
  • get_customer_orders(phone_number) — for "where's my stuff" queries
  • search_faq(query) — vector search on FAQ knowledge base
  • request_refund(order_id, reason) — initiates refund (requires human approval if > ₹2000)
  • update_shipping_address(order_id, new_address) — only if order not yet shipped
  • escalate_to_human(summary, urgency) — hand off

Memory: Conversation history within the session + customer's recent order history pulled at session start.

Guardrails:

  • Refunds > ₹2000 trigger human approval flow
  • Address change flow validates against shipping carrier API
  • Agent escalates if customer expresses frustration (sentiment scoring)
  • Hard cap of 30 LLM calls per support session

Stack:

  • Anthropic Claude 4.7 as the LLM
  • Custom Node.js orchestrator (200-line state machine)
  • WhatsApp Business API for I/O
  • Postgres for session/audit logs
  • Pinecone for FAQ knowledge base

Build cost: ₹3–5 lakhs. Time-to-production: 3–5 weeks. Ongoing: ₹20,000–50,000/month including LLM costs at typical SMB volume.

This is the kind of agent we ship for clients via our AI-Powered Automation service.

The Stack We Recommend in 2026

For most production agents:

  • LLM: Anthropic Claude 4.6/4.7 via the Anthropic SDK
  • Orchestration: Custom state machine in TypeScript or Python (avoid framework lock-in for production)
  • Tools: Native function-calling, exposed via MCP for reusability
  • Memory: Postgres for structured + Pinecone/pgvector for semantic
  • Hosting: AWS or GCP, containerised
  • Observability: LangSmith or custom logging to PostHog/Datadog

For prototyping and rapid iteration:

  • LangGraph + Streamlit + LiteLLM for fast experimentation
  • Then port the validated agent to a production stack

Common Mistakes That Kill Production Agents

Mistake 1: Building before defining the goal. "An AI agent for our business" is not a goal. "An AI agent that handles Tier 1 customer support on WhatsApp, escalating cases above X complexity" is a goal.

Mistake 2: Giving the agent too many tools. 30 tools = confused agent. Cut to 5–15.

Mistake 3: No human-in-the-loop. Every consequential action should have a human-approval flow. The agent's job is triage, not autonomy.

Mistake 4: No cost caps. Loops can run away. Hard cap LLM calls per goal.

Mistake 5: Building on the cheapest LLM. Cheap LLMs fail tool calling more often, hallucinate tool arguments, and produce worse outcomes. The cost saving is illusory once you account for failure-recovery work.

Mistake 6: Skipping observability. You'll need to debug what the agent decided and why. Log every input, every tool call, every output. Audit logs are non-negotiable for production.

Mistake 7: No fallback path. When the agent fails, what happens? The fallback should always be: graceful escalation to a human with the full context summary.

When to Build an Agent vs a Simpler Workflow

Agents add operational complexity. They're worth it when:

  • The task has variable steps (different inputs require different sequences)
  • The decision tree is too complex to enumerate
  • You're already at SMB scale where the volume justifies LLM costs

They're not worth it when:

  • The workflow has fixed steps (a regular automation script is simpler)
  • The volume is low (a human can do it in less time than the build)
  • The stakes are too high for any LLM error rate (legal, financial, medical with serious consequences — keep humans primary, AI assistive)

Where Nexolve Fits

We build production AI agents for Indian businesses across customer support, ops automation, sales enablement, and internal tools. Our AI-Powered Automation service covers scoping, design, build, and observability setup. For real deployment patterns, see our AI Automation System case study.

For more theoretical foundations, Agentic AI Systems and LLM Architecture Deep Dive. For business-level use case selection, AI Automation for Indian SMBs.

Working on something similar?

Nexolve scopes, designs, and ships production software for startups and growing businesses. Tell us what you're building — we come back with a scoped plan within 48 hours.

Related reading