Case study: Anthropic SDK agents và Claude Code agents

Từ bài 1 đến bài 19, series này đi theo hướng bottom-up: xây agent từ scratch, rồi mới nhìn lên framework. Bài này đổi hướng. Thay vì nói về pattern trên giấy, chúng ta sẽ nhìn vào hai implementation cụ thể đang chạy trong production: Anthropic Agent SDK (Python) và Claude Code subagent system (phần mềm CLI bạn đang dùng nếu đọc bài này qua blog của tôi).

Cả hai đều dùng Claude làm LLM. Cả hai đều có khái niệm “spawn agent con”. Cả hai đều xử lý tool use. Nhưng cách chúng giải quyết bài toán thì khác nhau, và sự khác nhau đó nói lên nhiều điều về trade-off design khi agent vào production.

Vì sao chọn hai implementation này

Anthropic SDK là lớp thấp nhất bạn có thể dùng để xây agent với Claude. Không có abstraction nào ở giữa: bạn gọi client.messages.create(), nhận về Message object, parse tool call, gọi tool, đẩy kết quả vào history, lặp lại. Mọi quyết định về loop, về error handling, về memory đều nằm trong tay bạn.

Claude Code là một agent thực thụ đang chạy trên máy tính của tôi khi tôi viết bài này. Nó có Task tool để spawn subagent, có worktree isolation, có background mode, có khái niệm “team” cho phép agents giao tiếp. Anthropic không publish source code của Claude Code, nhưng behavior của nó đủ để đọc ngược lại các design decision.

Hai implementation, hai góc nhìn: một cái là blank slate, một cái là production-grade với battle scars.

Anthropic Agent SDK nhìn từ control loop

Bài 5 đã có một agent 30 dòng dùng Anthropic SDK. Ở đây tôi sẽ đẩy lên một bước: agent có structured output dùng Pydantic, có streaming, và có thể spawn subagent bằng cách gọi đệ quy.

Structured output với Pydantic

Thay vì để LLM trả về plain text rồi parse thủ công, có thể dùng instructor library (wrapper trên Anthropic SDK) để enforce Pydantic model:

import anthropic
import instructor
from pydantic import BaseModel
from typing import Optional

class TaskResult(BaseModel):
    status: str  # "done" | "blocked" | "need_clarification"
    summary: str
    next_steps: Optional[list[str]] = None
    blocked_by: Optional[str] = None

raw_client = anthropic.Anthropic()
client = instructor.from_anthropic(raw_client)

def run_task(task: str, context: str = "") -> TaskResult:
    result = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"Task: {task}\nContext: {context}"
            }
        ],
        response_model=TaskResult,
    )
    return result

instructor dùng tool use phía sau để enforce schema. LLM vẫn gọi tool (một “fake tool” tên return_structured_response), nhưng bạn chỉ thấy Pydantic object. Không còn json.loads() rồi KeyError lúc 2 giờ sáng.

Streaming structured output

Với task dài, streaming giúp user thấy tiến trình thay vì nhìn spinner:

import anthropic
from anthropic.types import MessageStreamEvent

client = anthropic.Anthropic()

def agent_with_streaming(user_input: str, tools: list[dict]):
    messages = [{"role": "user", "content": user_input}]

    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        tools=tools,
        messages=messages,
    ) as stream:
        for event in stream:
            if hasattr(event, "type"):
                if event.type == "content_block_delta":
                    if hasattr(event.delta, "text"):
                        print(event.delta.text, end="", flush=True)
                elif event.type == "message_stop":
                    break

        final_message = stream.get_final_message()

    return final_message

Streaming trong agent loop thêm một complexity: bạn phải buffer partial tool call JSON trước khi có thể gọi tool. anthropic.Stream đã handle điều này; điều bạn cần chú ý là không gọi tool quá sớm (khi JSON chưa complete).

Subagent bằng đệ quy

Trong Anthropic SDK thuần, không có concept “spawn subagent” built-in. Nhưng bạn có thể implement bằng cách expose một spawn_subagent tool:

import anthropic
import json

client = anthropic.Anthropic()
TOOLS = [
    {
        "name": "spawn_subagent",
        "description": "Spawn a subagent to handle a subtask. Returns the subagent result.",
        "input_schema": {
            "type": "object",
            "properties": {
                "task": {
                    "type": "string",
                    "description": "The task for the subagent to complete"
                },
                "context": {
                    "type": "string",
                    "description": "Context the subagent needs"
                }
            },
            "required": ["task"]
        }
    }
]

def run_agent(task: str, max_depth: int = 3, current_depth: int = 0) -> str:
    if current_depth >= max_depth:
        return f"[Max recursion depth reached at depth {current_depth}]"

    messages = [{"role": "user", "content": task}]
    for _ in range(20):
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            tools=TOOLS,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason == "end_turn":
            for block in resp.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        if resp.stop_reason == "tool_use":
            tool_results = []
            for block in resp.content:
                if block.type == "tool_use":
                    if block.name == "spawn_subagent":
                        # Đệ quy: subagent chạy agent_loop riêng
                        sub_result = run_agent(
                            task=block.input["task"],
                            max_depth=max_depth,
                            current_depth=current_depth + 1,
                        )
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": sub_result,
                        })
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations exceeded"

Pattern này là đồng bộ. Subagent chạy xong thì parent mới tiếp. Muốn parallel: thread/asyncio, mỗi subagent một coroutine, tổng hợp kết quả sau.

Claude Code subagent system: Task tool và worktree isolation

Claude Code expose khả năng spawn subagent qua Task tool. Khác với đệ quy trong Anthropic SDK, Task trong Claude Code có một số behavior đặc biệt:

Cấu trúc Task tool call

Khi Claude Code spawn subagent, nó tạo ra một YAML-like instruction block (ví dụ minh họa behavior, không phải syntax thực):

# Claude Code Task tool invocation (minh họa behavior)
tool: Task
input:
  description: "Write post 20 of AI Agents series"
  prompt: |
    Write src/content/blog/anthropic-sdk-claude-code-agents.md
    seriesOrder: 20
    Target branch: main
    Auto-commit and open PR on finish.
  isolation: worktree        # optional: run in isolated git worktree
  run_in_background: true    # optional: parent does not block

Khi isolation: worktree được set:

Claude Code chạy git worktree add .claude/worktrees/<slug> HEAD để tạo checkout riêng
Subagent làm việc trong folder đó, không đụng main working tree
Branch của subagent là worktree/<slug>, base từ HEAD của parent lúc spawn

Khi run_in_background: true:

Parent session không block
Parent nhận notification khi subagent done
Parent không thấy file change của subagent trong real-time

Điểm thứ ba là quan trọng nhất và hay bị bỏ qua.

Background agent không thấy file change của main session

Đây là một pitfall nhỏ nhưng gây ra confusion thật. Kịch bản:

t=0  Main session: spawn background subagent A để viết file X.md
t=0  Subagent A start, chạy trong worktree riêng, không thấy main working tree
t=5  Main session: tự mình edit file Y.md (cùng repo)
t=10 Subagent A done, báo xong, push branch worktree/A
t=11 Main session: git pull --ff-only origin worktree/A
     -> File Y.md vẫn nguyên vẹn (A không touch Y)
     -> File X.md được tạo bởi A, merge vào

Điều đó đúng và an toàn vì worktree isolation. Nhưng nếu main session cũng edit X.md sau t=0:

t=0  Main: spawn A để viết X.md
t=3  Main: tự mình cũng edit X.md (không biết A đang làm)
t=10 A done, merge X.md từ A
     -> Conflict hoặc last-writer-wins tùy cách merge

Không có lock, không có warning. Isolation chỉ bảo vệ tại filesystem level, không bảo vệ khỏi semantic conflict khi hai người cùng edit cùng resource.

Rule thực hành: nếu main session spawn background agent để làm X, đừng tự mình cũng làm X. Một trong hai làm, không phải cả hai.

Bài học 1: isolation mặc định, hand-off rõ ràng

Bài học đầu tiên từ Claude Code design: isolation nên là default, không phải opt-in.

Trong Anthropic SDK thuần, không có isolation. Tất cả chạy trong cùng process, cùng memory. Subagent (nếu có) chia sẻ state với parent. Đây không phải lỗi thiết kế của SDK; SDK được tạo ra để flexible. Nhưng khi bạn bắt đầu build multi-agent system trên SDK, isolation sẽ là thứ bạn tự thêm vào.

Claude Code đặt cược ngược lại: isolation mặc định qua worktree, parent và subagent chia sẻ ít nhất có thể. Muốn share state, phải explicit.

Hệ quả thực tế khi thiết kế agent framework của bạn:

Câu hỏi	Khuyến nghị
Agent con có cần đọc output của agent khác không?	Nếu có, explicit pass qua tool result, không share memory
Hai agent có thể write cùng file không?	Không. Slice file scope trước khi spawn
Agent con có cần biết agent khác đang làm gì?	Nếu có, dùng Team pattern với SendMessage, không worktree
Agent chạy background có cần notify kết quả không?	Có. Thiết kế callback/notification ngay từ đầu

Bài học 2: đừng dựa vào global state

Bài học thứ hai: agent không nên có global mutable state.

Trong Anthropic SDK implementation trên, messages là list được truyền qua từng vòng lặp. Subagent có messages riêng của nó. Không có global agent_state = {} được share.

Tại sao quan trọng? Vì LLM không có state. Mỗi lần gọi client.messages.create(), bạn gửi toàn bộ history vào. Nếu bạn cố simulate state bằng global variable ngoài history, bạn tạo ra một nguồn truth thứ hai mà LLM không biết đến. Hai nguồn truth diverge, agent hành xử unpredictably.

Ví dụ sai:

# ĐỪNG làm thế này
GLOBAL_AGENT_STATE = {
    "files_modified": [],
    "tasks_completed": 0,
}

def run_agent(task):
    # Subagent cũng dùng GLOBAL_AGENT_STATE
    # Nhưng LLM không biết về nó
    # Khi subagent parallel chạy cùng lúc: race condition
    GLOBAL_AGENT_STATE["tasks_completed"] += 1

Cách đúng: nếu cần theo dõi state cross-agent, ghi vào file trên disk (cả hai có thể đọc) hoặc truyền qua tool result. LLM luôn thấy state trong history.

Pitfall thật: hai agent cùng chọn seriesOrder 25

Đây là incident xảy ra ngay trong quá trình viết series blog này, vào ngày 18/05/2026.

Tôi cần viết 2 bài còn thiếu trong K8s A-Z series: chữ W và chữ X. Spawn 2 agents parallel, mỗi agent một worktree riêng, không cùng touch file nào. Tưởng đã an toàn.

Cả hai agents nhận instruction: “Viết bài còn thiếu, tự tìm seriesOrder tiếp theo”. Cả hai check repo lúc spawn, thấy bài cao nhất là seriesOrder: 24. Cả hai tính: “24 + 1 = 25”. Cả hai commit bài với seriesOrder: 25.

Sau khi merge cả hai vào main: hai bài khác nhau cùng seriesOrder: 25. SeriesNav.astro sort theo trường này, render ra một series nav bị lặp và thiếu bài. Phải commit thêm để fix bài X lên 26.

Root cause: worktree isolation bảo vệ filesystem, không bảo vệ khỏi logical resource conflict. Cả hai agent xây trên cùng HEAD, đọc cùng state, ra cùng quyết định.

Fix: không để agent tự tính seriesOrder. Parent session pre-allocate trước khi spawn:

Spawn Agent W với explicit: seriesOrder: 25
Spawn Agent X với explicit: seriesOrder: 26

Nguyên tắc tổng quát: bất kỳ giá trị nào cần unique across agents, orchestrator phải pre-allocate trước khi spawn, không để agents tự tính.

Điều này áp dụng cho:

seriesOrder, post ID, database ID
Port numbers (agent A: 4321, agent B: 4322)
Kafka topic names, test database names
File slugs khi nhiều agent cùng tạo file trong một thư mục

So sánh nhanh hai cách làm

Tiêu chí	Anthropic SDK thuần	Claude Code Task tool
Isolation mặc định	Không (cùng process)	Có (worktree riêng)
Subagent parallel	Manual (threading/asyncio)	Built-in (`run_in_background`)
Communication	Truyền qua function arg	Truyền qua tool result
Global state	Bạn tự quản lý	Không có (disk-only)
Background visibility	Không (blocking mặc định)	Có (notification khi done)
Structured output	Qua `instructor` hoặc tự parse	Qua file trên disk
Error handling	Bạn tự implement	Claude Code tự retry
Max depth control	Bạn tự set `max_depth`	Claude Code enforce
Streaming	`client.messages.stream()`	N/A (agent writes to disk)
Cost transparency	Token usage từng call	Aggregated, khó đo per-subagent

Không có implementation nào “tốt hơn”. Anthropic SDK cho bạn control tuyệt đối và flexibility. Claude Code cho bạn isolation và coordination infrastructure. Trade-off phụ thuộc vào use case.

Nếu build một agent cho sản phẩm: Anthropic SDK, vì bạn cần customize mọi thứ.

Nếu dùng Claude Code để tự động hóa dev workflow: Task tool với worktree isolation, vì infrastructure đã có sẵn.

Đọc tiếp ở đâu trong series

Bài 5, Build agent từ đầu, đã có agent 100 dòng Anthropic SDK với một tool đơn giản. Bài đó là foundation, bài này là cái nhìn production.

Bài 16, Multi-agent patterns: supervisor, handoff, debate, đặt lý thuyết cho coordination giữa agents. Bài này là implementation thực tế của hai trong số các pattern đó.

Bài 28 của series Claude Code từ zero, FleetView: một màn hình thay cho 6 tmux pane, kể chi tiết hơn về cách Claude Code quản lý nhiều agent session cùng lúc. Nếu bạn tò mò về operational side của Claude Code subagent system, đọc bài đó.

Chốt lại bằng chuyện ranh giới

Hai implementation, một lesson chung: agent design là về ranh giới. Ranh giới giữa agent con và agent cha. Ranh giới giữa state được share và state được isolated. Ranh giới giữa gì orchestrator nên quyết định và gì agent con nên tự quyết.

Anthropic SDK đặt ranh giới trong tay bạn, không enforce gì. Claude Code đặt ranh giới cứng ở filesystem level và build coordination tool trên đó. Cả hai cách đều hợp lý; điều quan trọng là biết ranh giới của mình nằm ở đâu trước khi spawn agent đầu tiên.

Sau khi biết cách build và coordinate, câu hỏi kế tiếp là: làm sao biết agent đang chạy đúng? Eval cho agent: trace, replay, golden set, regression mở đầu phần đó bằng thứ rất thực tế: trace, replay, và regression test.