file: ./content/docs/changelog.mdx meta: { "title": "Changelog" } # Changelog ## Week of 2025-07-07 * Loop can now create custom code scorers in playgrounds * Schema builder UI for structured outputs * Sort datasets when the `Faster tables` feature flag is enabled * Change LLM duration to be the sum, not average, of LLM duration across spans * Add support for Grok 4 and Mistral's Devstral Small Latest ## Data plane (1.1.13) * Fix support for `COALESCE` with variadic arguments * Add option to select logs for online scoring with a BTQL filter * Add ability to test online scoring configuration on existing logs * Mmap based indexing optimization enabled by default for Brainstore ## Python SDK version 0.1.7 \[upcoming] * Added support for loading prompts by ID via the `load_prompt` function. You can now load prompts directly by their unique identifier: ```python prompt = braintrust.load_prompt(id="prompt_id_123") ``` ## JS SDK version 0.0.210 \[upcoming] * Added support for loading prompts by ID via the `loadPrompt` function. You can now load prompts directly by their unique identifier: ```typescript #skip-compile const prompt = await loadPrompt({ id: "prompt_id_123" }); ``` ## Data plane (1.1.12) \[skipped] ## Week of 2025-06-30 * Time range filters on the logs page ## Data plane (1.1.11) * Add support for LLaMa 4 Scout for Cerebras * Turn on index validation (which enables self-healing failed compactions) in the Cloudformation by default. ## Week of 2025-06-23 * Add support for multi-factor authentication * Fix a bug with Vertex AI calls when the request includes the anthropic-beta header * Add Zapier integration to trigger Zaps when there's a new automation event or a new project. ## Data plane (1.1.7) * Improve performance of error count queries in Brainstore * Automatically heal segments that fail to compact * Add support for new models including o3 pro * Improve error messages for LLM-originated errors in the proxy ## Autoevals.js v0.0.130 * Remove dependency on `@braintrust/core` ## JS SDK version 0.0.209 * Ensure SpanComponentsV3 encoding works in the browser. ## JS SDK version 0.0.208 * Ensure running remote evals (i.e. `runDevServer`) works without the CLI wrapper. * Add span + parent ids to `StartSpanArgs` ## Week of 2025-06-16 * Add OpenAI's [o3-pro](https://platform.openai.com/docs/models/o3-pro) model to the playground and AI proxy. * View parameters are now present in the url when viewing a default view * Experiments charting controls have been added into views * Experiment objects now support tags through the API and on the experiments view * Add support for Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite ### Python SDK version 0.1.5 * The SDK's under-the-hood log queue will not block when full and has a default size of 25000 logs. You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment. The environment variable `BRAINTRUST_QUEUE_DROP_WHEN_FULL` is no longer used. * Improvements to the logging of parallel tool calls. * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts. ### JS SDK version 0.0.207 * The SDK's under-the-hood queue for sending logs now has a default size of 5000 logs. You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment. * Improvements to the logging of parallel tool calls. * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts. ## Data plane (1.1.6) * Patch a bug in 1.1.5 related to the `realtime_state` field in the API response. ## Data plane (1.1.5) * Default query timeout in Brainstore is now 32 seconds. * Auto-recompact segments which have been rendered unusable due to an S3-related issue. * Gemini 2.5 models ## Data plane (1.1.4) * Optimize "Activity" (audit log) queries, which reduces the query workload on Postgres for large traces (even if you are using Brainstore). * Automatically convert base64 payloads to attachments in the data plane. This reduces the amount of data that needs to be stored in the data plane and improves page load times. You can disable this by setting `DISABLE_ATTACHMENT_OPTIMIZATION=true` or `DisableAttachmentOptimization=true` in your stack. * Improve AI proxy errors for status codes 401->409 * Increase real-time query memory limit to 10GB in Brainstore ## Week of 2025-06-09 * Correctly propagate `expected` and `metadata` values to function calls when running `invoke`. This means that if you provide `expected` or `metadata`, `input` refers to the top-level input argument. If you are passing in a value like `{input: "a"}`, then you must now use `{{input.input}}` to refer to the string "a", if you pass in `expected` or `metadata`. This should have no effect on the playground or scorers. * Chat-like thread layout that simplifies thread display to LLM and score data * Enable all agent nodes to access dataset variables with the mustache variable `{{dataset}}`. For example, to access `metadata.foo` in the third prompt in an agent, you can use `{{dataset.metadata.foo}}`. * Improve reliability of online scoring when logging high volumes of data to a project. * Tags can now be sorted in the project configuration page which will change their display order in other parts of the UI. * System-only messages are now supported in Anthropic and Bedrock models. * Logs page UI can now filter nested data fields in `metadata`, `input`, `output`, and `expected`. ### Python SDK version 0.1.4 * Add `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`). * `@traced` now works correctly with async generator functions. * The OpenAI and Anthropic wrappers set `provider` metadata. ### JS SDK version 0.0.206 * Add support for `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`). * The OpenAI and Anthropic wrappers set `provider` metadata. ## Week of 2025-06-02 * Support reasoning params and reasoning tokens in streaming and non-streaming responses in the [AI proxy](/docs/guides/proxy) and across the product (requires a stack update to 0.0.74). * New [braintrust-proxy](https://pypi.org/project/braintrust-proxy/) Python library to help developers integrate with their IDEs to support new reasoning input and output types. * New `@braintrust/proxy/types` module to augment OpenAI libraries with reasoning input and output types. * New streaming protocol between Brainstore and the API server speeds up queries. * Time brushing interaction enabled on Monitor page charts. * Can create user-defined views in the monitoring page. * Live updating time mode added to the monitoring page. * The `anthropic` package is now included by default in Python functions. * Audit log queries must now specify an `id` filter for the set of rows to fetch. These queries will only return the audit log for the specified rows, rather than the whole trace. * (Beta) continuously export logs, experiments, and datasets to S3. * Enable passing `metadata` and `expected` as arguments to the first agent prompt node. ### Python SDK version 0.1.3 * Improve retry logic in the control plane connection (used to create new experiments and datasets). ## Week of 2025-05-26 * The "Faster tables" flag is now the default (you may need to update your data plane if you are self-hosted). You should notice experiments, datasets, and the logs page load much faster. * Add Claude 4 models in Bedrock and Vertex to the AI proxy and playground. * Braintrust now incorporates cached tokens into the cost calculations for experiments and logs. The monitor page also now includes separate lines so you can track costs and counts for uncached, cached, and cache creation tokens. * Native support for thinking parameters in the playground.