CastClaw Documentation

Quick Start

Installation & Environment

CastClaw's quick-start flow matches the homepage guide: install the CLI globally, verify the version, configure the LLM interactively, and then start forecasting from the dataset directory.

Prerequisites: Bun ≥ 1.3.11, Python ≥ 3.10, uv, and at least one LLM API key.

Dependency	Version	Purpose
Bun	≥ 1.3.11	Runtime and package manager
Python	≥ 3.10	ML backend for forecasting models
uv	Latest	Python dependency management
GPU (optional)	CUDA 12.8	Deep learning acceleration
Ascend NPU (optional)Recommended	Atlas 800 A2/A3 (Ascend HDK 25.5.1)	Huawei Ascend acceleration for deep learning. Recommended if you want to try domestic compute infrastructure.

Install

# Global npm install (recommended)
npm install -g castclaw

Verify Installation

castclaw --version

Configure LLM

# Type castclaw in the terminal to enter interactive API key configuration
castclaw

# Or run /connect inside the CastClaw terminal to switch providers
/connect

Start Forecasting

# Enter the dataset directory and launch the CLI
cd /path/to/your/dataset
castclaw

After the CLI starts, enter the task description in the Planner tab (Ctrl+1):

# Example: initialize an energy consumption forecasting task
Initialize a forecasting session for data/etth1.csv. Target column: OT, time column: date,
forecast horizon: 96 steps, lookback length: 336. Use a 70/20/10 split and evaluate with MSE and MAE.

Recommendation

For your first run, connect only one provider you already know well. Prefer the interactive API key setup through castclaw; use /connect when you need to switch providers.

First Forecast Task (5-Minute Walkthrough)

We recommend starting with load.csv (fields: TIMESTAMP / LOAD, hourly frequency) to walk through the full workflow quickly. The goal is to understand the system rhythm, not to squeeze out the absolute best score on day one.

1

Prepare a clean working directory

Put load.csv in a dedicated experiment folder so it does not share .forecast/ state with other tasks.

2

Launch the CLI

Run castclaw inside that directory. On the first run, confirm the model and budget settings.

3

Enter the task in Planner

Specify the target column, time granularity, forecast horizon, and metrics. The more concrete the description, the better the Skill draft.

4

Review the Skill draft

Check whether the model families, search space, and risk notes are reasonable before continuing.

5

Watch iterations and the final report

Forecaster runs the experiment loop, and Critic produces final-report.md. Watch for possible human-in-the-loop pauses.

mkdir my-run && cd my-run
# After placing load.csv in the current folder
castclaw

Sample Planner prompt

Please forecast the next 24 hours of power load using load.csv; the time column is TIMESTAMP, the target column is LOAD, the evaluation metrics are MAE and MAPE, and the experiment budget should stay within 20 runs.

CLI Basics

The CLI is the command-line workspace where the three agents collaborate. The key is to know which stage you are in and which agent owns the next move.

Shortcut / Area	Purpose	What to watch
`Ctrl+1`	Switch to Planner	Check whether task definition, analysis output, and Skill drafts are coherent.
`Ctrl+2`	Switch to Forecaster	Watch for stagnation, improvement pace, and human-in-the-loop triggers.
`Ctrl+3`	Switch to Critic	Verify that the report covers breakdowns, evidence, and concrete next steps.
Task status panel	Inspect current stage and budget	Confirm whether you are in Init, Analysis, Forecasting, or Report.

Common misunderstanding

The CLI is a stage-driven task workspace, not a one-shot chat box. The important question is always: which stage are we in, and who acts next?

Interaction Modes

CastClaw is not limited to a single usage pattern. Depending on task ambiguity, collaboration depth, and engineering context, you can use it in three modes: clarify the problem first, co-drive the execution loop, or plug CastClaw into a broader AI workflow as a specialist forecasting module.

Socratic Interaction

This mode fits tasks whose goals are still vague. CastClaw asks questions first so you can pin down the target, time granularity, exogenous variables, evaluation metrics, and resource budget before execution begins.

Focus	Description
Best for	Early-stage planning, problem framing, and requirement clarification.
Core action	Iterative questioning, assumption cleanup, and forecast-plan freezing.
Your role	Provide domain context and confirm target column, timestamp, horizon, and metrics.

Recommendation

If you cannot yet explain what should be forecasted or how success will be judged, start here. It is the lowest-cost way to avoid wasting budget on the wrong task definition.

Cognitive Accompaniment

This mode fits iterative experimentation. The system surfaces phase transitions, Skill audits, stagnation signals, and key intermediate results so that you can observe and intervene while execution is in progress.

Focus	Description
Best for	Experiment loops, collaborative tuning, and result inspection.
Core action	Monitor internal state, review Skills, and intervene when runs stagnate or drift.
Your role	Watch the budget, spot anomalies, and inject domain feedback at critical points.

Typical trigger

Use this mode when Forecaster stops improving, when metrics suddenly degrade, or when the current best result conflicts with domain common sense.

General-Specialist Fusion

This mode fits larger engineering workflows. A general AI agent handles code, environment, and orchestration, while CastClaw contributes specialist forecasting capabilities through Skills, CLI entry points, or plugin-style integration.

Focus	Description
Best for	Engineering integration, research workbenches, and unified agent entry points.
Core action	Use CastClaw as a specialist forecasting module inside a broader workflow.
Your role	Define clean input/output boundaries and decide which tasks belong to the general agent versus CastClaw.

Boundary warning

This mode often fails when ownership is vague. Decide in advance who owns task definition and explanation, and who owns model execution and reporting.

CLI Reference

`castclaw` Launch Options

Command	Purpose
`castclaw`	Launch CastClaw in the current directory and take over the forecasting task in this project context.
`castclaw --model anthropic/claude-sonnet-4-6`	Explicitly select the primary model at launch time and temporarily override the default configuration.
`castclaw --version`	Print the current CLI version, useful when you are debugging environment mismatch.

Shortcut Cheat Sheet

Shortcut	Action
`Ctrl+1`	Switch to Planner
`Ctrl+2`	Switch to Forecaster
`Ctrl+3`	Switch to Critic

Once these three shortcuts become muscle memory, the workflow feels natural: define and review in Planner, observe convergence in Forecaster, and judge evidence in Critic.

`forecast_state` / `forecast_task` Tools

`forecast_state`

forecast_state init: initialize the task and create the .forecast/ workspace.
Stage transitions are enforced. You should not skip them if you want the run to remain traceable.

`forecast_task`

Helper tool for task definition around task.json.
Useful during Init when you need to confirm target column, time column, forecast horizon, and metrics.

Configuration

`CAST.md` Policy File

This is the project-level policy file automatically injected into every agent context. It should hold stable, long-lived rules that every stage should know.

Field	Description
`banned_models`	List of banned models or model families that should be excluded immediately.
`max_experiments`	Maximum number of experiments, which caps the Forecaster exploration budget.
`no_improve_threshold`	Number of non-improving rounds before a human-in-the-loop pause is triggered.
`eval_metric`	Preferred evaluation metric such as MAE, MAPE, or RMSE.
`domain_notes`	Domain background injected into every agent context to keep judgments aligned with business reality.

`castclaw.json` Parameters

This is the task-level or project-level default configuration entry point. It is a good place to share baseline model and budget settings across a team.

{
  "model": "anthropic/claude-sonnet-4-6",
  "light_model": "anthropic/claude-haiku-4-5",
  "max_experiments": 20,
  "no_improve_threshold": 5
}

Configuration tip

Put your highest-quality model in the primary slot and a lighter model on the auxiliary analysis path to balance quality and speed.

Model Providers (LLM Configuration)

CastClaw follows a Vercel AI SDK style integration pattern, so you can connect multiple providers depending on network conditions, budget, and deployment constraints.

Global providers

Anthropic Claude, OpenAI GPT, and Google Gemini fit environments with stable international API access.

China-friendly providers

DeepSeek, Qwen, and GLM fit domestic network environments or teams that prefer locally accessible endpoints.

Deployment options

Direct APIs, self-hosted inference services, or Ascend-backed compute APIs all fit the same orchestration model.

Core Concepts

Architecture Overview

CastClaw uses a four-layer collaborative architecture: CastRuntime handles execution and state, CastSkill handles strategy selection, the plugin ecosystem provides concrete capabilities, and TimeEmbed aligns representations and experience across tasks.

User Task
  ↓
CastRuntime (execution loop, context management)
  ↓
CastSkill (strategy retrieval and selection)
  ↓
CastSense → CastFeat → CastZoo
  ↓
Reflection / Report

Layer	Responsibility	When You Touch It
CastRuntime	Task context management, phase progression, agent loop control, and file-state synchronization.	When starting a task, switching phases, or resuming a paused run.
CastSkill	Retrieves or generates strategies from analysis results and decides model families and search spaces.	When reviewing Skill drafts, curating experience, or reusing prior strategies.
Plugin Ecosystem	Uses CastSense, CastFeat, and CastZoo for diagnostics, representation building, and model orchestration.	When interpreting results, designing features, or choosing model paths.
TimeEmbed	The capability foundation for cross-task representations, similar-pattern retrieval, and experience alignment.	You usually do not operate it directly, but it affects Skill retrieval quality.

Multi-Agent Collaboration (Planner / Forecaster / Critic)

The three agents form a strict assembly line. Treat them as different roles, not interchangeable chat windows.

Agent	Core Responsibility	Key Behaviors
Planner	Task definition, data diagnostics, and phase orchestration.	Runs qualitative and quantitative analysis in parallel, writes the pre-forecast report, and drafts candidate Skills.
Forecaster	Experiment loops and strategy iteration.	Reads history, selects configs, calls CastFeat and CastZoo, records reflections, and triggers Human in the Loop when needed.
Critic	Result aggregation and final reporting.	Compares model-family performance, generates visualizations and structured conclusions, and produces `final-report.md`.

Agentic Workflow

CastClaw breaks forecasting into five tightly constrained phases. Phase transitions are enforced by tools and file protocols rather than LLM self-discipline, which makes the workflow traceable and auditable.

1

Initialization

Freezes task.json and creates the .forecast/ working directory. Every later experiment stays bound to the same task definition.

2

Pre-forecast Analysis

Runs two tracks in parallel: qualitative domain analysis through WebSearch and quantitative diagnostics through CastSense. The outputs merge into the pre-forecast report that drives Skill generation.

3

Skill Audit

Planner pauses after drafting 2 to 4 candidate Skills, waiting for human review of model routes, risks, and search spaces before experiments begin.

4

Forecasting

Read the best prior result → choose a config → build representations with CastFeat → train and evaluate through CastZoo → record reflections → check budget → repeat. Human in the Loop is triggered when progress stalls.

5

Post-forecast Report

Critic consolidates experiment artifacts, performance breakdowns, and visual explanations into a structured final-report.md.

Design focus

CastClaw is not differentiated by "running more models." It is differentiated by analyzing first, reviewing second, iterating third, and allowing humans to correct the path at critical points.

Human in the Loop

What Human in the Loop Means

When Forecaster shows no improvement across multiple rounds, produces abnormal results, or follows a strategy that clearly conflicts with domain knowledge, the system pauses at a recoverable checkpoint and waits for human feedback. This is not failure. It is a forced correction window.

Checkpoint 1: Confirm the task setup

Confirm the target column, time column, forecast horizon, evaluation metric, and resource limits so a flawed task definition does not get amplified downstream.

Checkpoint 2: Review the strategy Skill

Confirm that the model families, parameter search spaces, and risk notes are reasonable before large-scale experiments are launched in the wrong direction.

Checkpoint 3: Intervene during stagnation

Inject domain priors, block ineffective model families, or add new feature hypotheses before resuming iteration.

Do not treat Human in the Loop as a rerun button

Effective intervention changes the strategic assumptions: constrain model families, narrow the search space, explain anomalous dates, or add external constraints.

Skill Audit: How to Intervene

When reviewing a Skill, the key question is whether the strategy truly fits the current task, not whether the YAML looks neat. Focus on applicability, search space, and risk notes.

What to review

Whether the applicability conditions match the data profile, such as strong seasonality, long sequences, or stable frequency.
Whether the parameter search space is too broad and likely to waste budget on low-value regions.
Whether the risk warnings cover known failure modes such as overfitting on small samples, distribution drift, or nighttime zeros.

Recommended interventions

Remove clearly unsuitable model families directly instead of spending budget to prove they are wrong.
Add known holidays, equipment changes, or policy events to the domain notes.
Narrow critical search dimensions such as learning rate, window length, or patch length.

When to Confirm Results and Intervene

The most valuable moments for human review are when directional signals change, not after every single experiment round.

No improvement across rounds Results conflict with domain knowledge Best model family changes Anomalous dates matter materially Budget is nearly exhausted

Practical rule

If your intervention cannot change the next strategy decision, do not step in yet. Human in the Loop creates value by changing the path, not by repeatedly confirming the current state.

Plugin Toolbox

CastSense: Data Diagnostics

CastSense answers the question, "What state is this series in right now?" It turns trend, seasonality, anomalies, and distribution changes into structured knowledge that Planner can use to generate strategies.

Trend and seasonality detection

Detects long-term trends, daily cycles, weekly cycles, and multiscale periodicities to help decide which model path should come first.

Anomaly and drift localization

Finds change points, outliers, non-stationarity, and distribution drift, providing evidence for risk prompts and Human in the Loop interventions.

Structured outputs

Turns diagnostics into structured knowledge that later Skill retrieval, feature design, and model orchestration can all consume.

CastFeat: Feature Construction

CastFeat answers the question, "How should the data be transformed into model-ready representations?" It converts raw time series into forms that better match downstream models.

lag / rolling statistical features frequency-domain and multiscale representations patch / token embedding model-ready representation

How to think about it

CastFeat is not "manual feature engineering one more time." It unifies domain features, statistical features, and foundation-model input formats into a single representation-building pipeline.

CastZoo: Model Orchestration

CastZoo answers, "Which models should be used, and how should they be combined?" It is not only a model repository. It also handles strategy-aware scheduling.

Supported model families

Statistical models such as ARIMA, ETS, and Theta; machine learning models; deep learning models such as Informer and PatchTST; and foundation models such as Chronos, TimesFM, and Moirai.

Supported strategies

Single-model runs, multi-model ensembles, coarse-to-fine two-stage scheduling, or using foundation-model outputs as priors.

Skill System

What a Skill Is

A Skill is a reviewed strategy template. It describes which model families, search spaces, feature templates, and known risks fit a given class of data under specific conditions. It is the key asset layer that lets the system keep evolving.

Carry forward proven experience

Preserve routes that have already been validated in experiments instead of starting from scratch every time.

Guide future tasks

Narrow the model space for new tasks so Forecaster starts from a more reasonable baseline.

Keep humans in control

Review before use so the system evolves on top of trusted strategies instead of automatically accumulating noise.

Skill File Structure

Skills are expressed in YAML. The core fields are applicability conditions, model family, search space, feature template, and risks.

name: deep_learning_periodic
applicable_conditions:
  - strongly seasonal data
  - sequence length > 5000
model_family: deep_learning
models: [PatchTST, iTransformer]
search_space:
  learning_rate: [1e-4, 5e-4]
  patch_len: [16, 32, 64]
feature_template: patch_token
risks:
  - high overfitting risk when data volume is insufficient
domain_notes: ""

Review focus

Start with applicable_conditions and risks. These two parts most directly determine whether the Skill should be used for the current task.

How to Review and Curate Skills

Planner drafts 2 to 4 candidate Skills from the pre-forecast analysis.
Humans review the model route, applicability conditions, and risk notes, editing them directly when needed.
Approved Skills enter .forecast/skills/ for reuse in the current and similar tasks.
The library evolves over time and gradually becomes a team-level strategy asset.

Review principle

Keep a small number of high-quality Skills rather than accumulating a large set of low-signal strategies. The value of the Skill library is trustworthiness, not size.

`/cast-creation` Command

Interactively generates the CAST.md project-constraint file. Use it before a task starts to define disallowed models, budget caps, evaluation preferences, and domain notes.

When to use it

Use it when you already know which models should be excluded, how many experiments the budget allows, or which domain notes must be injected for every agent.

What problem it solves

It avoids repeating the same constraints verbally in every run and reduces the chance that agents forget critical limits in later phases.

Examples

These three cases cover load, solar, and financial time series. The point is to build intuition for how different data shapes map to different strategies. Focus on the data profile, the recommended Skill path, and where Human in the Loop matters.

Power Load Forecasting (`load.csv`)

load.csv is the best starter dataset for a first demo. It contains hourly load values with stable daily and weekly cycles.

Data profile

Hourly frequency, roughly 15,000 samples, strong daily seasonality at 24 hours, strong weekly seasonality at 168 hours, and clear summer and winter peaks.

Recommended strategy

Start with a combined deep_learning path using PatchTST and iTransformer plus a foundation path using Chronos.

Expected outputs

If you see pre-forecast.md, the experiment directories, and final-report.md, the main workflow is running end to end.

Solar Power Forecasting

Solar generation combines strong daily seasonality, fixed nighttime zeros, and strong weather sensitivity. It is a canonical case where domain knowledge needs to intervene.

Data profile and diagnostic focus

This is hourly data from the GEFCom2014 Solar Track. CastSense should pay special attention to nighttime zeros, abrupt weather changes, and seasonal shifts.

Recommended strategy and Human in the Loop

Start with a statistical plus foundation path using Theta together with TimesFM or Moirai. Human intervention matters most when labeling long cloudy periods and weather-abnormal days.

Financial Time-Series Forecasting

Financial series are volatile, non-stationary, and sensitive to external shocks. They are not a good fit for blindly committing to a single deep-learning route and require stronger risk awareness and external-event injection.

Recommended strategy

Use a conservative statistical + foundation ensemble so the full budget is not concentrated on one path.

Where humans should step in

Mark earnings releases, policy announcements, and macro shocks, then pay close attention to CastSense alerts on distribution drift and structural breaks.

FAQ & Troubleshooting

Common Questions

How do I switch LLM providers?

Update the model configuration in castclaw.json or switch the relevant environment variables.

How do I continue after a Human-in-the-Loop pause?

Enter feedback in the Forecaster tab and submit it. The system resumes in the current context without reinitializing from scratch.

Where are Skill files managed?

They live in .forecast/skills/ by default. Stable reviewed Skills should be promoted into a shared team asset library.

Why are results unstable?

First check whether the task definition or budget is too small, then check for unlabeled anomalous dates, and only after that consider model optimization.

Environment Troubleshooting

Symptom	What to Check
Bun version is too old	Upgrade to `1.3.11` or later, reopen the terminal, and verify with `bun --version`.
Python backend errors	Enter the Python directory and run `uv sync` to make sure dependencies are installed correctly.
API key is not taking effect	Check whether the environment variable has been `export`ed in the current shell, or confirm that `castclaw.json` overrides the model settings.
A phase will not advance	Check whether `.forecast/` is missing required files, especially `task.json` and the report artifacts for each phase.

Quick Start

Installation & Environment

First Forecast Task (5-Minute Walkthrough)

Prepare a clean working directory

Launch the CLI

Enter the task in Planner

Review the Skill draft

Watch iterations and the final report

CLI Basics

Interaction Modes

Socratic Interaction

Cognitive Accompaniment

General-Specialist Fusion

CLI Reference

castclaw Launch Options

Shortcut Cheat Sheet

forecast_state / forecast_task Tools

forecast_state

forecast_task

Configuration

CAST.md Policy File

castclaw.json Parameters

Model Providers (LLM Configuration)

Global providers

China-friendly providers

Deployment options

Core Concepts

Architecture Overview

Multi-Agent Collaboration (Planner / Forecaster / Critic)

Agentic Workflow

Initialization

Pre-forecast Analysis

Skill Audit

Forecasting

Post-forecast Report

Human in the Loop

What Human in the Loop Means

Checkpoint 1: Confirm the task setup

Checkpoint 2: Review the strategy Skill

Checkpoint 3: Intervene during stagnation

Skill Audit: How to Intervene

What to review

Recommended interventions

When to Confirm Results and Intervene

Plugin Toolbox

CastSense: Data Diagnostics

Trend and seasonality detection

Anomaly and drift localization

Structured outputs

CastFeat: Feature Construction

CastZoo: Model Orchestration

Supported model families

Supported strategies

Skill System

What a Skill Is

Carry forward proven experience

Guide future tasks

Keep humans in control

Skill File Structure

How to Review and Curate Skills

/cast-creation Command

When to use it

What problem it solves

Examples

Power Load Forecasting (load.csv)

Data profile

Recommended strategy

Expected outputs

Solar Power Forecasting

Data profile and diagnostic focus

Recommended strategy and Human in the Loop

Financial Time-Series Forecasting

Recommended strategy

Where humans should step in

FAQ & Troubleshooting

Common Questions

How do I switch LLM providers?

How do I continue after a Human-in-the-Loop pause?

Where are Skill files managed?

Why are results unstable?

`castclaw` Launch Options

`forecast_state` / `forecast_task` Tools

`forecast_state`

`forecast_task`

`CAST.md` Policy File

`castclaw.json` Parameters

`/cast-creation` Command

Power Load Forecasting (`load.csv`)