VERL (Volcano Engine Reinforcement Learning) is an open-source RL post-training framework for LLMs, originally developed by ByteDance Seed and maintained by the VERL community. VERL ships with a built-in Weave trace backend: when you enable it, every rollout trajectory — including LLM generations and tool calls — is logged to Weave alongside the training metrics W&B already records. Use Weave with VERL to:Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-dbrian-docs-2509-verl-integration.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Inspect each rollout trajectory step-by-step, including prompts, model responses, and tool invocations.
- Filter trajectories by step, sample index, rollout number, and experiment name.
- Compare multiple trajectories side-by-side to debug agent behavior across training steps.
Prerequisites
- A W&B account and API key. For more information, see API keys.
- A VERL installation that supports rollout tracing (see VERL installation). Rollout tracing was added in verl#2345.
- An async rollout configuration. Tracing only applies to asynchronous rollouts; synchronous rollouts are not traced.
RL training produces a lot of trace data — the VERL maintainers note that runs can generate tens of gigabytes per day. The W&B Free Plan includes 1 GB of monthly network traffic, so plan-tier and
max_samples_per_step_per_worker (described below) should be considered before launching a long run.Enable Weave tracing
Set your W&B API key in the environment so VERL can authenticate:weave.init() yourself.
actor_rollout_ref.rollout.trace.backend=weave— selects Weave as the trace backend.actor_rollout_ref.rollout.mode=async— enables async rollout for vLLM or SGLang. Tracing has no effect on synchronous rollouts.trainer.project_nameandtrainer.experiment_name— Weave logs to the same project as wandb.
trainer.logger=['console','wandb']— enables the wandb logger alongside Weave so metrics and traces appear in the same project.
Tune trace volume
By default, VERL traces every sample in every rollout, which can produce very large amounts of trace data. Limit the volume withmax_samples_per_step_per_worker:
max_samples_per_step_per_worker: Each agent loop worker independently selects up to N unique samples to trace per training step. For GRPO withn > 1, all rollouts for selected samples are traced. The total traces per step equalsmax_samples_per_step_per_worker * num_workers * n. Set tonull(default) to trace all samples.token2text: Set toTrueto add decodedprompt_textandresponse_textto theToolAgentLoop.runoutput. Defaults toFalsefor performance. Enable it when you want to read prompts and completions directly in the Weave UI.
View traces
After training starts, open your W&B project page and select Weave in the sidebar, then Traces. Each trace corresponds to a rollout trajectory. Filter by:step— the global training step.sample_index— the dataset sample identifier (fromextra_info.index).rollout_n— the rollout sequence number for GRPO-style sampling.experiment_name— the value you set intrainer.experiment_name.
Trace additional functions
VERL exposes two helpers for extending the default trace coverage:rollout_trace_op— a decorator that marks a method on a class instance for tracing. By default, only a small number of methods are decorated; add it to methods on your custom agent loop or tool implementations to capture more detail.rollout_trace_attr— a context manager that marks the entry of a trajectory and attaches trajectory metadata (sample index, step, rollout number, experiment name). If you introduce a new agent type, wrap its trajectory entrypoint withrollout_trace_attrso the trace is associated with the run.