Remote OpenClaw

Remote OpenClaw Blog

How to Build an AI Agent from Scratch: Complete Guide

8 min read ·

Building an AI agent from scratch requires four core components: a perception layer for input, a reasoning engine (LLM), tools for taking actions, and memory for maintaining state. As of April 2026, you can build a functional agent in under an hour using SDKs from Anthropic or OpenAI, or use a framework like OpenClaw to skip the boilerplate entirely.

This guide walks through every step: choosing a framework, connecting an LLM, adding tools, implementing memory, and wiring up orchestration logic. Whether you use a managed framework or build from raw API calls, the architecture follows the same pattern.

AI Agent Architecture: The Four Components

Every AI agent follows a four-component architecture: perception, reasoning, action, and memory. This pattern holds whether you are building with raw API calls or using a high-level framework.

The perception layer handles all inputs: user messages, webhook payloads, scheduled triggers, or sensor data. The reasoning engine is the LLM that interprets inputs and decides what to do next. The action layer executes tool calls, API requests, file operations, or any side effect. Memory stores context across interactions so the agent can learn and maintain consistency.

ComponentPurposeOptions
Perception (Input)Receive and parse user requests, events, or dataChat interface, webhooks, email, Telegram, Slack, scheduled triggers
Reasoning (LLM)Interpret context, plan steps, decide which tools to callClaude (Anthropic), GPT-5 (OpenAI), Gemini (Google), Llama 4 (Meta), local models via Ollama
Action (Tools)Execute tasks in the real worldAPI calls, file I/O, database queries, web scraping, code execution
Memory (State)Persist context across conversations and sessionsMEMORY.md files, vector databases, conversation history, SQLite

This architecture is sometimes called the "agent loop" or "ReAct pattern" (Reasoning + Acting). The agent observes its environment, reasons about what to do, takes an action, and stores the result. Then it repeats until the task is complete.


Choosing a Framework

Framework choice determines how much boilerplate you write versus how much control you retain. As of April 2026, the main options range from full-featured agent platforms to raw SDK calls.

FrameworkTypeBest ForLanguage
OpenClawFull agent platformSelf-hosted personal/business agents with multi-channel supportNode.js
LangChainAgent frameworkCustom chains, RAG pipelines, complex tool orchestrationPython, TypeScript
CrewAIMulti-agent frameworkTeams of specialized agents collaborating on tasksPython
Raw OpenAI/Anthropic SDKAPI clientMaximum control, minimal dependencies, simple use casesPython, TypeScript, others

If you want a working agent with Telegram, WhatsApp, and Slack support in under 30 minutes, OpenClaw is the fastest path. If you need fine-grained control over every prompt and tool call, start with the raw SDK. LangChain sits in the middle, offering pre-built components you can customize. CrewAI is specifically designed for multi-agent workflows where multiple agents collaborate.

For most builders, the recommendation is to start with a framework and drop down to raw SDK calls only for components that need custom behavior.


Connecting an LLM

Connecting an LLM to your agent requires an API key, an SDK or HTTP client, and a system prompt that defines the agent's behavior. The system prompt is the most important piece because it determines how the agent interprets inputs and uses tools.

With the Anthropic SDK, a minimal agent connection looks like this:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant that can search the web and manage files.",
    messages=[{"role": "user", "content": "What's the weather in San Diego?"}],
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
        }
    }]
)

The key decisions at this stage are model selection (cost vs. capability), context window size (how much history the agent can process), and whether to use streaming for real-time responses. In OpenClaw, the LLM connection is configured through a YAML file rather than code, which makes it easy to swap models without changing application logic.


Adding Tools and Actions

Tools are what transform an LLM from a text generator into an agent that can take real-world actions. A tool is a function the agent can call, defined by a name, description, and input schema that the LLM uses to decide when and how to invoke it.

Common tool categories include:

  • Data retrieval: web search, database queries, API calls, file reading
  • Data manipulation: file writing, spreadsheet updates, database inserts
  • Communication: sending emails, Slack messages, Telegram messages
  • Code execution: running scripts, shell commands, calculations
  • External services: calendar management, CRM updates, payment processing

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

In OpenClaw, tools are defined as skills using Markdown files with a specific format. Each skill file describes what the tool does, what inputs it accepts, and the code or API call to execute. This approach makes tools portable and shareable through the marketplace.

The critical design decision is tool granularity. Tools that are too broad ("do_everything") give the LLM poor guidance on when to use them. Tools that are too narrow ("add_single_row_to_sheet_column_B") create decision fatigue. Aim for tools that map to clear, single-purpose actions.


Implementing Memory and State

Memory is what separates a useful agent from a stateless chatbot. Without memory, the agent forgets everything between conversations and cannot build on past interactions.

There are three main types of agent memory:

  • Short-term memory: The current conversation context window. Limited by the LLM's token limit (typically 128K-200K tokens as of April 2026).
  • Long-term memory: Persistent storage that survives across sessions. Implemented through files (like OpenClaw's MEMORY.md system), vector databases, or structured databases.
  • Episodic memory: Records of specific past interactions the agent can reference. Useful for learning from previous successes and failures.

OpenClaw implements memory through plain Markdown files stored in a memory directory. This approach is human-readable, easy to edit manually, and does not require a separate database. For agents that need semantic search over large knowledge bases, vector databases like Pinecone or ChromaDB provide retrieval-augmented generation (RAG) capabilities.

The simplest starting point is conversation history plus a single memory file. Add vector search only when you have enough stored knowledge that keyword matching becomes insufficient.


Orchestration: The Agent Loop

Orchestration is the control logic that ties perception, reasoning, action, and memory together into a continuous loop. The agent loop follows a consistent pattern regardless of framework.

The basic agent loop works as follows:

  1. Receive input from user or trigger
  2. Load relevant memory and context
  3. Send to LLM with system prompt, tools, and context
  4. Parse LLM response for tool calls or final answer
  5. Execute tool calls and collect results
  6. Feed results back to LLM if more reasoning is needed
  7. Store relevant information in memory
  8. Return final response to user

Error handling is critical in this loop. Tools fail, APIs time out, and LLMs occasionally produce malformed tool calls. A production agent needs retry logic, fallback behaviors, and guardrails to prevent runaway loops. OpenClaw handles this through its built-in orchestration engine, which includes configurable retry limits, timeout settings, and approval gates for sensitive actions.

For agents built from raw SDKs, implement a maximum iteration limit (typically 10-25 loops) and a timeout to prevent infinite loops when the agent cannot resolve a task.


Limitations and Tradeoffs

Building an AI agent from scratch involves significant tradeoffs that are important to understand before starting.

Cost: Every agent loop iteration costs money in API calls. A complex task that requires 10 tool calls and 5 reasoning steps can cost several dollars with frontier models. Monitor token usage carefully and set budget limits.

Reliability: LLMs are non-deterministic. The same input can produce different tool call sequences, which makes testing difficult. Agents occasionally hallucinate tool names, produce invalid JSON for tool inputs, or get stuck in loops.

Security: Agents that execute code, access APIs, or modify files create attack surfaces. Prompt injection, tool abuse, and unintended data access are real risks. Read the AI agent security guide before deploying any agent to production.

When not to build from scratch: If your use case is covered by an existing framework like OpenClaw, building from scratch adds development time without proportional benefit. Build from scratch only when you need custom orchestration logic that no framework supports, or when you need to minimize dependencies for a specific deployment environment.


Related Guides


Frequently Asked Questions

What programming language is best for building AI agents?

Python is the most common language for building AI agents due to its extensive ecosystem of AI libraries, SDK support from OpenAI and Anthropic, and frameworks like LangChain and CrewAI. TypeScript is a strong second choice, especially for web-integrated agents, and both OpenAI and Anthropic offer official TypeScript SDKs.

How long does it take to build an AI agent from scratch?

A minimal AI agent with a single tool can be built in under an hour using an SDK like OpenAI or Anthropic. A production-ready agent with multiple tools, memory, error handling, and orchestration typically takes 2 to 4 weeks. Using a framework like OpenClaw or LangChain significantly reduces development time compared to building everything from raw API calls.

Do I need to train my own LLM to build an AI agent?

No. Most AI agents use existing foundation models like Claude, GPT-5, or open-source models through APIs. The agent layer handles orchestration, tool use, and memory on top of these models. Fine-tuning is only necessary for highly specialized domains where general models underperform.

What is the difference between an AI agent and a chatbot?

A chatbot responds to messages in a conversation. An AI agent can autonomously plan multi-step tasks, use external tools like APIs and databases, maintain persistent memory across sessions, and take actions in the real world. Agents operate in a loop of perception, reasoning, and action rather than simple request-response.

Can I build an AI agent without coding?

Yes, platforms like OpenClaw, Flowise, and Dify offer no-code or low-code agent building. OpenClaw lets you configure agents through YAML files and Markdown-based skill definitions without writing application code. However, custom tool integrations and complex orchestration logic typically require some programming.