Building an AI Data Pipeline with Handrive + Claude Code
Handrive exposes 42 MCP tools that let AI agents manage file transfers programmatically. This tutorial walks through building an automated data pipeline using Claude Code.
What We Are Building
The goal: an automated pipeline that collects training data from edge devices, transfers it to a central GPU cluster, monitors progress, and notifies the team on completion. No manual steps after initial setup.
The pipeline has four stages:
- Headless server setup on the receiving machine (GPU cluster or NAS)
- Programmatic share creation with access controls
- Transfer monitoring with progress tracking
- Automated notifications on completion or failure
Prerequisites
- Handrive installed on both sending and receiving machines (download here)
- Claude Code with the Handrive MCP server configured
- A headless server instance for always-on availability (optional but recommended for production)
MCP Primer
The Model Context Protocol (MCP) is a standard for AI agents to interact with external tools. Handrive's MCP server exposes 43 tools that Claude Code can call directly. Each tool is a structured JSON call with typed parameters and return values. No custom scripting required.
Step 1: Configure the Headless Server
For a production pipeline, you want the receiving end always available. Handrive's headless mode runs as a background service without a GUI. First, verify the headless server is running by checking auth status through MCP:
MCP Tool Call: Check authentication status
{
"tool": "auth_status",
"parameters": {}
}
// Response:
{
"logged_in": true,
"user_id": "usr_abc123",
"user_email": "gpu-cluster-01@yourteam.com",
"device_id": "dev_xyz789"
}If the server is not authenticated, use the OTP flow — request a code sent to your email, then verify it:
MCP Tool Calls: Authenticate via OTP
{
"tool": "request_otp",
"parameters": {
"email": "gpu-cluster-01@yourteam.com"
}
}
// Then verify with the code sent to that email:
{
"tool": "verify_otp",
"parameters": {
"email": "gpu-cluster-01@yourteam.com",
"code": "123456"
}
}Step 2: Create Shares Programmatically
Instead of manually creating shares through the UI, use MCP to create them with consistent naming and access controls. This is where automation pays off: every training run gets a share with the same structure.
MCP Tool Call: Create a share for training data
{
"tool": "create_share",
"parameters": {
"name": "training-run-2026-02-20",
"local_path": "/data/training/run-20260220",
"create_dir": true
}
}
// Then add team members with specific roles:
{
"tool": "add_member",
"parameters": {
"share_id": "sh_abc123",
"email": "ml-engineer@yourteam.com",
"role": "editor"
}
}You can also list existing shares to check what is already available before creating new ones:
MCP Tool Call: List existing shares
{
"tool": "list_shares",
"parameters": {
"filter": "local"
}
}
// Response:
[
{
"share_id": "sh_abc123",
"name": "training-run-2026-02-20",
"local_path": "/data/training/run-20260220"
},
{
"share_id": "sh_xyz789",
"name": "training-run-2026-02-19",
"local_path": "/data/training/run-20260219"
}
]Step 3: Monitor Transfers in Real Time
Once data starts flowing (either from edge devices pushing to the share or from the headless server pulling), you can monitor transfer progress through MCP. This is useful for building dashboards or triggering downstream actions.
MCP Tool Calls: List active transfers, then check a specific one
{
"tool": "list_transfers",
"parameters": {}
}
// Then get details on a specific transfer:
{
"tool": "get_transfer",
"parameters": {
"transfer_id": "tr_abc123"
}
}
// You can also check folder stats to verify what's landed:
{
"tool": "get_folder_stats",
"parameters": {
"share_id": "sh_abc123"
}
}By using list_transfers and get_transfer, Claude Code can poll for completion and trigger downstream actions automatically. For why the UDP-based protocol outperforms TCP on high-latency links, see our analysis of TCP limitations.
Step 4: Automate Post-Transfer Actions
The real power of MCP integration is chaining actions. Once a transfer completes, Claude Code can automatically trigger the next step in your pipeline: validate the data, kick off a training job, or notify the team.
Here is a complete pipeline script that Claude Code can execute. It polls for transfer completion, lists received files, and outputs a summary:
MCP Tool Call: List received files for validation
{
"tool": "list_files",
"parameters": {
"share_id": "sh_abc123",
"limit": 100
}
}
// Or search for specific files:
{
"tool": "search_files",
"parameters": {
"share_id": "sh_abc123",
"query": "batch",
"limit": 50
}
}
// Check overall stats:
{
"tool": "get_folder_stats",
"parameters": {
"share_id": "sh_abc123"
}
}Claude Code can use get_folder_stats to verify file counts and total sizes, search_files to find specific files, and generate a report. If anything is missing, it can identify which files need retransmission without re-sending the entire dataset.
Putting It All Together
In practice, you describe your pipeline intent to Claude Code in natural language, and it translates that into the right sequence of MCP calls. For example:
Natural language prompt to Claude Code
"Create a new share for today's training run. Wait for all three edge nodes to finish uploading. Once complete, verify that all 4,200 files are present and checksums match. Then post a summary to the team Slack channel with transfer speed and total size."
Claude Code breaks this into the MCP calls shown above, executes them sequentially, handles errors (retries on transient failures, alerts on permanent ones), and reports results. No bash scripts, no cron jobs, no custom code to maintain.
Production Considerations
- Error handling: MCP tool calls return structured error responses. Claude Code can distinguish between retryable errors (network timeout, peer temporarily offline) and permanent failures (authentication expired, disk full) and act accordingly.
- Concurrency: Multiple shares can be active simultaneously. A single headless server can handle transfers from dozens of peers, limited by network bandwidth, not software overhead.
- Security: Every transfer uses end-to-end encryption. The MCP server authenticates locally. No credentials are sent to external services. This matters for AI data center operations where training data and model checkpoints are sensitive IP.
- Scaling: For multi-site deployments, run a headless server at each location. Claude Code can manage transfers across all of them through a single conversation, coordinating a mesh of peer-to-peer connections.
Beyond File Transfer
The MCP tools cover more than just moving bytes. You can use them to build inventory systems (list all shares, track what data is where), access control workflows (create time-limited shares for external collaborators), and audit trails (query transfer history, verify delivery). Combined with Claude Code's ability to call other tools (databases, APIs, notification services), Handrive becomes one component in a fully automated data infrastructure.
Related Posts
- Why File Transfer Breaks in the AI Era
- Securing the Earth-to-Orbit AI Pipeline
- File Transfer for AI Training Data
Automate Your Data Pipeline
Handrive's 43 MCP tools turn Claude Code into your data operations engineer. Free, private, and programmable.
Download Handrive