Chain-of-Thought so với structured reasoning

Bài 8 series này nói về Tree of Thoughts: cho agent tạo ra nhiều nhánh suy luận, đánh giá từng nhánh, chọn đường tốt nhất. Bài 6 nói về ReAct: thought, action, observation xen kẽ nhau trong mỗi vòng lặp.

Cả hai đều xây trên cùng một ý tưởng: agent cần nghĩ trước khi hành động. Câu hỏi bài này giải quyết là: “nghĩ” theo cách nào?

Có hai cực: prompt thô ("think step by step") và schema cứng (reasoning_steps: [...]). Mỗi cực có trade-off riêng. Ở giữa là reasoning model thế hệ mới, nơi việc “nghĩ” được bake vào training, không phải prompt.

CoT cổ điển

Chain-of-thought prompting (CoT) ra đời trong paper của Google Brain năm 2022. Ý tưởng: thêm vào prompt cụm "Let's think step by step" hoặc ví dụ có reasoning chain, model sẽ tự sinh ra bước trung gian trước khi đưa câu trả lời cuối.

import anthropic

client = anthropic.Anthropic()

def ask_with_cot(question: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"{question}\n\nThink step by step before answering."
        }]
    )
    return response.content[0].text

result = ask_with_cot(
    "A train leaves city A at 9:00 AM going 80 km/h. "
    "Another train leaves city B at 10:00 AM going 100 km/h. "
    "Cities are 360 km apart. When do they meet?"
)
print(result)

Output (rút gọn):

Bước 1: Tính khoảng cách train 1 đi được trước 10:00 AM.
  Train 1 xuất phát 9:00 AM, đi 80 km/h trong 1 giờ = 80 km.

Bước 2: Khoảng cách còn lại khi train 2 xuất phát.
  360 - 80 = 280 km.

Bước 3: Tốc độ tiếp cận nhau.
  80 + 100 = 180 km/h.

Bước 4: Thời gian để gặp nhau sau 10:00 AM.
  280 / 180 = 1.555... giờ ≈ 1 giờ 33 phút.

Họ gặp nhau lúc 11:33 AM.

CoT hoạt động vì model phải commit vào một chuỗi lý luận trung gian. Việc đặt bút xuống từng bước ngăn model “nhảy cóc” tới câu trả lời sai. Accuracy tăng đáng kể trên toán, logic, code reasoning.

Vấn đề: output này là free-form text. Nếu bạn cần parse kết quả, thêm vào pipeline, hoặc audit từng bước, bạn đang parse unstructured text. Regex không đáng tin. Brittle.

Structured reasoning với JSON schema

Một bước xa hơn: ép model trả về reasoning_steps như một array trong JSON. Mỗi bước là object có type, content, và confidence rõ ràng.

import json
import anthropic

client = anthropic.Anthropic()

REASONING_TOOL = {
    "name": "structured_answer",
    "description": "Answer a question with explicit reasoning steps",
    "input_schema": {
        "type": "object",
        "properties": {
            "reasoning_steps": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "step_number": {"type": "integer"},
                        "description": {"type": "string"},
                        "conclusion": {"type": "string"}
                    },
                    "required": ["step_number", "description", "conclusion"]
                }
            },
            "final_answer": {"type": "string"},
            "confidence": {
                "type": "string",
                "enum": ["high", "medium", "low"]
            }
        },
        "required": ["reasoning_steps", "final_answer", "confidence"]
    }
}

def ask_with_structured_reasoning(question: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        tools=[REASONING_TOOL],
        tool_choice={"type": "tool", "name": "structured_answer"},
        messages=[{
            "role": "user",
            "content": question
        }]
    )
    for block in response.content:
        if block.type == "tool_use" and block.name == "structured_answer":
            return block.input
    return {}

result = ask_with_structured_reasoning(
    "A train leaves city A at 9:00 AM going 80 km/h. "
    "Another train leaves city B at 10:00 AM going 100 km/h. "
    "Cities are 360 km apart. When do they meet?"
)

print(json.dumps(result, indent=2, ensure_ascii=False))

Output:

{
  "reasoning_steps": [
    {
      "step_number": 1,
      "description": "Calculate distance covered by train 1 before train 2 departs",
      "conclusion": "Train 1 covers 80 km in 1 hour (9:00-10:00 AM)"
    },
    {
      "step_number": 2,
      "description": "Remaining gap when train 2 departs",
      "conclusion": "360 - 80 = 280 km remaining"
    },
    {
      "step_number": 3,
      "description": "Combined closing speed",
      "conclusion": "80 + 100 = 180 km/h"
    },
    {
      "step_number": 4,
      "description": "Time to meet after 10:00 AM",
      "conclusion": "280 / 180 ≈ 1.556 hours = 1h 33min"
    }
  ],
  "final_answer": "The trains meet at 11:33 AM",
  "confidence": "high"
}

Bây giờ bạn có thể:

Đếm số bước (len(result["reasoning_steps"]))
Filter step nào có lỗi
Hiển thị từng bước với UI step-by-step
Log cụ thể bước nào agent sai để debug

Structured reasoning không chỉ là cosmetic. Nó là contract giữa LLM và pipeline.

Trade-off thật sự

Tiêu chí	CoT tự do	Structured reasoning
Dễ implement	Rất dễ (1 dòng thêm vào prompt)	Cần thiết kế schema
Parse được máy	Không đáng tin	Có, native JSON
Readable với người	Tự nhiên, dễ đọc	Verbose, cứng nhắc
Audit từng bước	Khó, cần parse text	Dễ, từng object
Debug agent fail	Mơ hồ	Chỉ ra bước sai
Overhead token	Thấp hơn	Cao hơn (schema overhead)
Conflict với tool use	Hiếm	Có thể xảy ra

Điểm quan trọng nhất trong bảng: “Conflict với tool use”. Đây là pitfall không ai nói đến.

Pitfall: structured reasoning đụng tool schema

Giả sử bạn đang build một agent có tools, và bạn thêm structured reasoning bằng cách ép model luôn gọi structured_answer tool trước khi làm bất kỳ thứ gì.

# CÁCH NÀY GÂY VẤN ĐỀ
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=[REASONING_TOOL, READ_FILE_TOOL, WRITE_FILE_TOOL],
    tool_choice={"type": "tool", "name": "structured_answer"},  # force reasoning trước
    messages=messages
)

Vấn đề xảy ra: model biết nó cần gọi read_file để có data, nhưng bạn đang force nó gọi structured_answer trước. Model không có thông tin để reason. Nó sẽ hoặc:

Hallucinate dữ liệu vào reasoning steps (vì chưa đọc file)
Sinh ra reasoning vô nghĩa với confidence: "low" để qua bước này
Ignore tool_choice constraint và trả về text (behavior tùy model)

Trong một agent thật tôi debug năm ngoái, agent có task “read CSV, analyze, write report”. Tôi thêm structured reasoning để audit, ép model reason trước. Kết quả: agent reason về file content rồi mới đọc file. Reasoning steps toàn placeholder. Report sai. Mất 2 giờ để trace ra vì structured output trông “clean” quá.

Cách fix: Tách reasoning ra khỏi action. Reasoning chỉ xảy ra sau khi đã có đủ context.

def agent_with_safe_reasoning(task: str, max_iter: int = 10):
    messages = [{"role": "user", "content": task}]
    for _ in range(max_iter):
        # Bước 1: Gọi tools để thu thập data (không force reasoning)
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            tools=[READ_FILE_TOOL, WRITE_FILE_TOOL],
            messages=messages
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Bước 2: Khi đã có đủ context, mới reason và trả kết quả
            reasoning_response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=2048,
                tools=[REASONING_TOOL],
                tool_choice={"type": "tool", "name": "structured_answer"},
                messages=messages + [{
                    "role": "user",
                    "content": "Now summarize your analysis with explicit reasoning steps."
                }]
            )
            # extract và return structured result
            for block in reasoning_response.content:
                if block.type == "tool_use":
                    return block.input
            return {}

        if response.stop_reason == "tool_use":
            tool_results = execute_tools(response.content)
            messages.append({"role": "user", "content": tool_results})

    return {}

Pattern này: tool calls trước, reasoning sau. Reasoning được add vào sau khi agent đã có đủ thông tin.

Reasoning model đổi cuộc chơi thế nào

Từ cuối 2024, cả hai approach trên trở nên ít quan trọng hơn với một số task. Lý do: reasoning model bake CoT vào training, không cần bạn trigger bằng prompt.

Xem thêm chi tiết về cách o1 và R1 được train trong bài 29 series LLM từ zero: reasoning models, o1, R1, chain-of-thought training.

Với Claude Sonnet 4.6, bạn có thể enable extended thinking:

import anthropic

client = anthropic.Anthropic()

def ask_with_extended_thinking(question: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=8000,
        thinking={
            "type": "enabled",
            "budget_tokens": 4000  # Claude tự dùng tới 4000 tokens để think
        },
        messages=[{
            "role": "user",
            "content": question
        }]
    )

    thinking_content = ""
    answer_content = ""

    for block in response.content:
        if block.type == "thinking":
            thinking_content = block.thinking  # internal reasoning, có thể read
        elif block.type == "text":
            answer_content = block.text

    return {
        "thinking": thinking_content,
        "answer": answer_content,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens
    }

result = ask_with_extended_thinking(
    "A train leaves city A at 9:00 AM going 80 km/h. "
    "Another train leaves city B at 10:00 AM going 100 km/h. "
    "Cities are 360 km apart. When do they meet?"
)

print("Thinking:", result["thinking"][:200], "...")
print("Answer:", result["answer"])
print(f"Tokens: {result['input_tokens']} in / {result['output_tokens']} out")

budget_tokens=4000 là “cho phép Claude dùng tối đa 4000 tokens để think”. Không có nghĩa Claude sẽ dùng hết. Nếu bài toán đơn giản, nó dùng ít hơn. Nếu phức tạp, nó dùng gần hết ngưỡng.

Điểm khác biệt quan trọng: với thinking enabled, bạn có thể đọc block.thinking để thấy reasoning thật của model. Đây không phải text được viết cho user, đây là internal scratchpad. Thường đọc được nhưng messy.

Với DeepSeek R1 (open source):

# DeepSeek R1 qua OpenAI-compatible API
from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-reasoner",  # R1 model
    messages=[{
        "role": "user",
        "content": "When do the two trains meet? [same problem]"
    }]
)

# R1 expose reasoning_content riêng
reasoning = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

Khi nào dùng gì

Ba approach, ba use case:

CoT prompt ("think step by step"):

Task đơn giản, không cần audit
Prototype nhanh, chưa biết cần gì
Task có một đáp án cuối, pipeline không cần parse từng bước
Budget token eo hẹp (structured overhead tốn thêm token)

Structured reasoning (JSON schema):

Pipeline cần parse kết quả từng bước
Agent cần audit trail: “agent sai ở bước nào”
UI hiển thị reasoning step-by-step cho user
Multi-agent: một agent verify reasoning của agent khác theo từng step
Quan trọng: chỉ áp dụng sau khi đã thu thập đủ context, không phải trước tool calls

Reasoning model (o1, Claude thinking, R1):

Task đòi hỏi suy luận nhiều bước thật sự: toán, code complex, phân tích logic
Khi CoT prompt không đủ và bạn không muốn tự quản lý reasoning structure
Production agent cần reliability cao cho reasoning task
Chú ý: reasoning model tốn token nhiều hơn, latency cao hơn. Không cần cho task đơn giản

Có một câu hỏi tôi hay nghe: “Nếu đã dùng reasoning model, còn cần structured reasoning không?”

Câu trả lời: có, vẫn cần. Reasoning model giải quyết chất lượng của suy luận bên trong. Structured output giải quyết interface của kết quả với phần còn lại của pipeline. Hai mục tiêu khác nhau. Bạn có thể dùng cả hai: Claude với thinking enabled, kết quả cuối vẫn qua structured_answer tool để pipeline parse được.

Khi nào bỏ qua CoT hoàn toàn

Không phải task nào cũng cần CoT. Thêm CoT không phải free, nó tốn token và tăng latency.

Bỏ qua CoT khi:

Task classification đơn giản: “email này spam hay không”
Extraction rõ ràng: “lấy số điện thoại trong đoạn text này”
Translation: không cần intermediate reasoning
Template fill: “điền tên vào template này”
Any task mà model đã nhất quán đúng 95%+ không cần CoT

CoT thực sự có lợi khi:

Task có nhiều bước phụ thuộc lẫn nhau
Có trap logic (bài toán có trick, model dễ nhảy cóc)
Cần justify câu trả lời cho người dùng
Multi-hop reasoning (A dẫn đến B dẫn đến C)

Rule of thumb: nếu bạn (human) cần hơn 10 giây để suy nghĩ, LLM có thể cần CoT. Nếu bạn trả lời ngay, CoT không cần.

Bảng chọn nhanh

Approach	Dùng khi	Tránh khi
CoT prompt	Prototype, task đơn giản, không cần parse	Pipeline phụ thuộc parse output
Structured reasoning	Cần audit, pipeline parse, UI step-by-step	Trước tool calls khi chưa có context
Extended thinking (Claude)	Task phức tạp, reasoning quality quan trọng	Task đơn giản, latency/cost nhạy cảm
R1/o1	Toán, code, logic phức tạp	Chatbot thông thường, task không cần reason

Pitfall	Triệu chứng	Fix
Force structured reasoning trước tool calls	Reasoning hallucinate, steps vô nghĩa	Collect data trước, reason sau
CoT quá dài không cần thiết	Token tăng, latency tăng, không thêm accuracy	Chỉ dùng khi task thực sự cần
Parse CoT free-form bằng regex	Brittle, break với format thay đổi	Dùng structured output nếu cần parse
`budget_tokens` quá lớn cho task đơn giản	Token đốt không cần thiết	Scale budget theo độ phức tạp task

Chốt lại: reasoning phải phục vụ action

CoT, structured reasoning, và reasoning model không phải ba cách làm cùng một thứ. Chúng giải quyết ba vấn đề khác nhau: thêm bước trung gian, tạo interface parse được, và train model suy luận tốt hơn. Trong agent thật, thường cần phối hợp.

Đừng force model “nghĩ nhiều hơn” nếu output cần chỉ là parse được và đúng schema. Với agent, reasoning phải phục vụ action tiếp theo, không phải làm bài văn giải thích.

Bài 11, Tool design: schema, validation, idempotency, chuyển từ reasoning sang action. Tools là nơi agent chạm vào thế giới thật, nên schema mơ hồ thường gây fail nhiều hơn prompt mơ hồ.