refactor(traceloop-sdk): add made by Traceloop evals #853

galzilber · 2025-12-30T21:53:25Z

Important

Refactor Traceloop SDK to support evaluators with optional configurations, add code generation tools, and relax evaluator version requirements.

New Features:
- Support for evaluators with optional configuration parameters in evaluator.ts and experiment.ts.
- New evaluator factory and validation utilities in mbt-evaluators.ts and registry.ts.
- Sample app sample_security_experiment.ts demonstrating security evaluator workflows.
Code Generation:
- Added generate-evaluator-models.ts and generate-models.sh for generating evaluator models from OpenAPI/Swagger.
- Dev dependencies added for OpenAPI parsing and TypeScript codegen in package.json.
Bug Fixes:
- Relaxed runtime requirement for evaluators to include a version field in experiment.ts.

^{This description was created by}^{for aa4393e. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Release Notes

New Features
- Added support for evaluator configuration, allowing evaluators to accept optional configuration parameters
- Introduced evaluator registry and factory utilities for easier discovery and creation of evaluators
- Added new sample demonstrating security evaluators (PII detection, secrets detection, prompt injection detection)
Chores
- Added build tooling for automated code generation from OpenAPI specifications
- Added new build script generate:evaluator-models for generating evaluator models

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-30T21:53:34Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The changes introduce OpenAPI/Swagger-based code generation for evaluators alongside updated TypeScript interfaces to support evaluator configuration. A new code generation script discovers evaluator endpoints from an OpenAPI spec and generates TypeScript artifacts, while the SDK is extended to export these generated types and support optional config in evaluator payloads.

Changes

Cohort / File(s)	Summary
Build & Dependency Configuration `package.json`, `scripts/codegen/tsconfig.json`, `packages/sample-app/package.json`	Root package.json adds npm script `generate:evaluator-models` and dev dependencies for OpenAPI parsing/codegen (`@apidevtools/swagger-parser`, `openapi-typescript`, `ts-node`, `@types/node`). Sample app package.json adds script `run:security_experiment`. TypeScript config added for codegen scripts.
Code Generation Tooling `scripts/codegen/generate-evaluator-models.ts`, `scripts/generate-models.sh`	New TypeScript codegen script parses OpenAPI/Swagger specs to discover evaluator endpoints and generate `registry.ts`, `mbt-evaluators.ts`, and `index.ts` with evaluator schemas, factory methods, and validation helpers. Bash wrapper script orchestrates codegen and type generation with Swagger 2.0-to-3.0 conversion.
SDK Type Definitions & Exports `packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts`, `packages/traceloop-sdk/src/lib/node-server-sdk.ts`	New exported interface `EvaluatorWithConfig` with name, version, config, and requiredInputFields. Updated `EvaluatorDetails` union to include `EvaluatorWithConfig`. SDK re-exports generated evaluator types and helper functions (`createEvaluator`, `validateEvaluatorInput`, etc.).
Runtime Evaluator Implementation `packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts`, `packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`	Evaluator payload construction now extracts and includes optional `evaluator_config` field when evaluator is an object. Version field validation requirement removed to allow evaluators without explicit versions.
Example & Tool Configuration `packages/sample-app/src/sample_security_experiment.ts`, `.prettierignore`	New TypeScript example demonstrating security evaluators (PII detector, secrets detector, prompt injection detector) using Traceloop SDK. Prettier configuration updated to ignore generated files under `/generated/`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Through OpenAPI specs the rabbit hops with glee,
Generating evaluators for all to see!
Config fields bloom where schemas grow,
Type-safe models steal the show.
Security tested, designs refined—
A Swift leap forward, well aligned! 🚀

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor(traceloop-sdk): add made by Traceloop evals' directly addresses the main objective of adding configurable evaluators and Traceloop-made evaluator implementations to the SDK.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa1018f and aa4393e.

📒 Files selected for processing (2)

packages/sample-app/package.json
packages/sample-app/src/sample_security_experiment.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/sample-app/package.json

🧰 Additional context used

🧠 Learnings (8)

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-13T08:00:42.668Z

Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts:27-27
Timestamp: 2025-08-13T08:00:42.668Z
Learning: In packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts, when the user said "there is no date" in response to a comment about removing "date" from column type unions, they may have been referring to local changes not yet reflected in the PR branch, as the code still contained "date" in the union types at lines 27 and 43.

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-12-30T22:05:07.422Z

Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 853
File: packages/sample-app/src/sample_security_evaluators.ts:37-46
Timestamp: 2025-12-30T22:05:07.422Z
Learning: In packages/sample-app/src/**/*.ts, allow demonstration code to omit defensive error handling and bounds checking for API responses. This pattern applies to sample/demo code only and should not be interpreted as a best practice for production code. Reviewers should still require basic guards for critical paths (security, integrity) and clearly document that the code is a demonstration with comments indicating assumptions.

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-12T13:58:25.583Z

Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/src/lib/client/dataset/column.ts:41-47
Timestamp: 2025-08-12T13:58:25.583Z
Learning: In packages/traceloop-sdk/src/lib/client/dataset/column.ts, when the user mentioned "its working there is transform func", they were confirming that the response transformer is functioning correctly and converting snake_case API responses to camelCase, which means the Column class should access camelCase properties (createdAt, updatedAt) from the transformed ColumnResponse data.

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/instrumentation-*/**/*.{ts,tsx} : Instrumentations must create spans with appropriate AI/LLM semantic attributes for calls they wrap

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/instrumentation-*/**/*.{ts,tsx} : Instrumentations must capture and record errors appropriately

Applied to files:

packages/sample-app/src/sample_security_experiment.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (3)

packages/sample-app/src/sample_security_experiment.ts (3)

1-11: File rename completed as requested.

The file has been renamed from sample_security_evaluators.ts to sample_security_experiment.ts, addressing the feedback from nina-kollman. The header documentation clearly explains the purpose and scope of the security evaluators.

93-97: Well-structured demonstration of configurable evaluators.

The evaluator configuration effectively demonstrates the new feature, showing both evaluators with configuration parameters (piiDetector, promptInjection) and without (secretsDetector). This provides clear examples of how to use the generated factory methods with type-safe config objects.

114-158: Effective result handling with type safety.

The type guard at line 114 properly discriminates between local and GitHub Actions execution results, and the subsequent handling for each case is comprehensive and well-structured. The explicit type annotations and formatted logging enhance clarity for users learning from this example.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 474c541 in 2 minutes and 20 seconds. Click for details.

Reviewed 2663 lines of code in 14 files
Skipped 1 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_security_evaluators.ts:28

Draft comment:
Tip: Validate that required environment variables (e.g. TRACELOOP_API_KEY) are set instead of relying on the non-null assertion. This will yield clearer errors when config is missing.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 35% vs. threshold = 50% This comment is suggesting a code quality improvement - to validate environment variables explicitly rather than using the non-null assertion operator. However, looking at the rules: 1) This is a sample app (the path shows "sample-app"), which suggests it's example/demo code where such validation might be less critical. 2) The comment uses "Validate that..." which is similar to "Ensure that..." - the rules specifically say not to make comments that start with "Verify that..." or "Ensure that...". 3) The comment is somewhat speculative - it's not pointing out a definite bug, but rather suggesting a best practice. 4) This is a code quality suggestion, which the rules say are good "if they are actionable and clear" - this one is actionable and clear. However, the rules also emphasize that we should only keep comments with STRONG EVIDENCE that they are correct, and by default assume comments are incorrect or not useful. This is a legitimate code quality suggestion that is actionable and clear. The non-null assertion could indeed lead to less clear error messages if the environment variable is missing. This is a sample app meant to demonstrate best practices, so suggesting proper error handling could be valuable. The comment doesn't start with "Verify that" or "Ensure that" - it says "Validate that" which is actually suggesting a code change (add validation), not asking the author to verify something. While the suggestion is technically valid, this is a sample application where the focus is on demonstrating security evaluators, not on production-grade error handling. The comment is borderline - it's a quality suggestion but not critical. The phrase "Validate that" could be interpreted as instructional rather than asking for verification. However, given the emphasis on only keeping comments with STRONG EVIDENCE and the fact that this is sample code where such validation is less critical, this leans toward being not essential. This is a borderline case. It's a reasonable code quality suggestion that is actionable, but it's not critical for sample/demo code. Given the instruction to err on the side of deleting comments unless there's STRONG EVIDENCE they're needed, and that this is a minor improvement rather than a clear bug, I'll give it a moderate-low score.

2. packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts:74

Draft comment:
Note: The evaluator config extraction uses typeof evaluator === "object" && "config" in evaluator. Ensure all evaluator variations are properly handled. Explicit type narrowing might improve clarity.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. scripts/codegen/generate-evaluator-models.ts:169

Draft comment:
Suggestion: Consider logging a warning for evaluator endpoints missing a request or response schema to aid debugging during code generation.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4. scripts/generate-models.sh:31

Draft comment:
Reminder: Ensure that the tsconfig.json referenced (via --project) exists and is correctly configured for the code generation script.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that a configuration file exists and is correctly set up. It doesn't provide a specific suggestion or point out a specific issue with the code. It falls under the category of asking the author to double-check something, which is against the rules.

Workflow ID: wflow_D8zOAEOx7JKjeEdh

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Skipped PR review on b664503 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts (1)
65-67: Error message mentions evaluator but validation doesn't check it.

The error message states "experimentId, evaluator, and taskResult are required" but the condition only validates experimentId and taskResult. The evaluator validation happens later at line 80.
🔎 Proposed fix
     if (!experimentId || !taskResult) {
-      throw new Error("experimentId, evaluator, and taskResult are required");
+      throw new Error("experimentId and taskResult are required");
     }

🧹 Nitpick comments (6)

scripts/codegen/generate-evaluator-models.ts (5)
1-1: Consider using standard shebang for ts-node.

The shebang #!/usr/bin/env npx ts-node is non-standard. When the script is invoked directly, npx adds overhead and may behave differently across environments.
🔎 Suggested alternative
-#!/usr/bin/env npx ts-node
+#!/usr/bin/env ts-node
Alternatively, since the script is invoked via the shell wrapper (generate-models.sh), you could remove the shebang entirely.
249-253: Non-null assertion on already-verified array.

The schemas! assertion is unnecessary since schema.oneOf || schema.anyOf guarantees at least one is truthy.
🔎 Proposed fix
   if (schema.oneOf || schema.anyOf) {
-    const schemas = schema.oneOf || schema.anyOf;
-    const types = schemas!.map((s) => openApiTypeToTs(s, allSchemas));
+    const schemas = (schema.oneOf || schema.anyOf)!;
+    const types = schemas.map((s) => openApiTypeToTs(s, allSchemas));
     return types.join(" | ");
   }
591-593: Unnecessary double non-null assertion.

Line 593 uses requestSchema!.properties! but line 586 already checks requestSchema?.properties, so the first ! is redundant.
🔎 Proposed fix
   const className = slugToClassName(evaluator.slug);
   const interfaceName = `${className}Config`;
-  const schemaProps = evaluator.requestSchema!.properties!;
+  const schemaProps = evaluator.requestSchema.properties!;
665-668: Unused variable schema.

The schema variable is assigned but only used to optionally add requiredInputFields. Consider removing the variable or using it more directly.
🔎 Proposed fix
-  lines.push("  const schema = EVALUATOR_SCHEMAS[slug];");
   lines.push("  const result: EvaluatorWithConfig = { name: slug };");
   lines.push("");
   lines.push("  if (options?.version) {");
   lines.push("    result.version = options.version;");
   lines.push("  }");
   lines.push("");
   lines.push("  if (options?.config) {");
   lines.push("    result.config = Object.fromEntries(");
   lines.push("      Object.entries(options.config).filter(([, v]) => v !== undefined),");
   lines.push("    );");
   lines.push("  }");
   lines.push("");
-  lines.push("  if (schema?.requiredInputFields?.length) {");
-  lines.push("    result.requiredInputFields = schema.requiredInputFields;");
+  lines.push("  const schema = EVALUATOR_SCHEMAS[slug];");
+  lines.push("  if (schema.requiredInputFields?.length) {");
+  lines.push("    result.requiredInputFields = schema.requiredInputFields;");
   lines.push("  }");
This moves the schema assignment closer to its usage, improving clarity.
882-884: Add error handling for JSON parsing.

JSON.parse can throw on malformed JSON. Consider wrapping in try-catch for clearer error messages.
🔎 Proposed fix
   console.log(`=== Reading swagger spec from ${swaggerPath} ===`);
   const swaggerContent = fs.readFileSync(swaggerPath, "utf-8");
-  const spec: OpenAPISpec = JSON.parse(swaggerContent);
+  let spec: OpenAPISpec;
+  try {
+    spec = JSON.parse(swaggerContent);
+  } catch (e) {
+    console.error(`Error: Failed to parse ${swaggerPath} as JSON`);
+    console.error(e instanceof Error ? e.message : String(e));
+    process.exit(1);
+  }
packages/sample-app/src/sample_security_evaluators.ts (1)
22-32: Add environment variable validation for better error messages.

The client initialization uses environment variables without validation:

Line 24: OPENAI_API_KEY could be undefined

Line 29: Non-null assertion on TRACELOOP_API_KEY will throw a cryptic error if the variable is unset

For a sample app that's meant to demonstrate best practices, consider adding explicit validation with helpful error messages.
🔎 Proposed validation
+// Validate required environment variables
+if (!process.env.OPENAI_API_KEY) {
+  throw new Error("OPENAI_API_KEY environment variable is required");
+}
+if (!process.env.TRACELOOP_API_KEY) {
+  throw new Error("TRACELOOP_API_KEY environment variable is required");
+}
+
 // Initialize OpenAI client
 const openai = new OpenAI({
   apiKey: process.env.OPENAI_API_KEY,
 });

 // Initialize Traceloop client
 const traceloop = new TraceloopClient({
-  apiKey: process.env.TRACELOOP_API_KEY!,
+  apiKey: process.env.TRACELOOP_API_KEY,
   appName: "security-evaluators-sample",
   baseUrl: process.env.TRACELOOP_API_ENDPOINT,
 });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f480743 and 474c541.

⛔ Files ignored due to path filters (5)

packages/traceloop-sdk/src/lib/generated/evaluators/index.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/request.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/response.ts is excluded by !**/generated/**

📒 Files selected for processing (10)

package.json
packages/sample-app/package.json
packages/sample-app/src/sample_security_evaluators.ts
packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
packages/traceloop-sdk/src/lib/node-server-sdk.ts
scripts/codegen/generate-evaluator-models.ts
scripts/codegen/tsconfig.json
scripts/generate-models.sh

🧰 Additional context used

📓 Path-based instructions (4)

packages/*/package.json

📄 CodeRabbit inference engine (CLAUDE.md)

Use workspace:* for intra-repo package dependencies in package.json

Files:

packages/sample-app/package.json

packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings

Files:

packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
packages/traceloop-sdk/src/lib/node-server-sdk.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts

packages/traceloop-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk

Files:

packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
packages/traceloop-sdk/src/lib/node-server-sdk.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts

packages/traceloop-sdk/src/lib/node-server-sdk.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Register new instrumentation packages during SDK initialization

Files:

packages/traceloop-sdk/src/lib/node-server-sdk.ts

🧠 Learnings (4)

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

scripts/codegen/tsconfig.json
packages/traceloop-sdk/src/lib/node-server-sdk.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts
packages/sample-app/src/sample_security_evaluators.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk

Applied to files:

packages/traceloop-sdk/src/lib/node-server-sdk.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/src/lib/node-server-sdk.ts : Register new instrumentation packages during SDK initialization

Applied to files:

packages/traceloop-sdk/src/lib/node-server-sdk.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them

Applied to files:

packages/traceloop-sdk/src/lib/node-server-sdk.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts

🧬 Code graph analysis (1)

scripts/codegen/generate-evaluator-models.ts (3)

packages/traceloop-sdk/src/lib/evaluators/index.ts (1)

EvaluatorSchema (26-26)

packages/traceloop-sdk/src/lib/generated/evaluators/index.ts (1)

EvaluatorSchema (24-24)

packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts (1)

EvaluatorSchema (9-14)

🔇 Additional comments (14)

packages/sample-app/package.json (1)

43-43: LGTM!

The new run:security_evaluators script follows the established pattern used by other sample scripts in this file.

scripts/codegen/tsconfig.json (1)

1-13: LGTM!

Sensible TypeScript configuration for the code generation script with strict mode enabled.

packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts (1)

69-97: LGTM!

The config extraction logic correctly distinguishes between string, EvaluatorWithVersion, and EvaluatorWithConfig types. The empty object check before adding evaluator_config to the payload is a good defensive measure.

scripts/codegen/generate-evaluator-models.ts (1)

96-101: Duplicate EvaluatorSchema interface definition.

This interface is defined locally here (lines 96-101) but is also generated in registry.ts (lines 460-465). The generated version in packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts is the authoritative one. This local definition is only used during codegen, which is fine, but ensure they stay in sync.

packages/traceloop-sdk/src/lib/node-server-sdk.ts (2)

28-29: LGTM!

Correctly exports the new EvaluatorWithConfig and EvaluatorWithVersion types from the interfaces module.

63-69: LGTM!

The new MBT evaluator utilities are properly exported, expanding the public API surface for SDK consumers.

packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts (1)

13-32: LGTM! Clear interface definitions.

The new EvaluatorWithConfig interface is well-documented. The type discrimination between EvaluatorWithVersion (required version) and EvaluatorWithConfig (optional version, has config and requiredInputFields) is handled correctly in evaluator.ts using "config" in evaluator check.

package.json (1)

7-14: Verify usage of openapi-typescript dependency.

The generate:evaluator-models script and supporting devDependencies are in place. However, the openapi-typescript package (line 13) requires verification to confirm whether it's actually imported or used anywhere in the codebase. If it's not used, consider removing it to keep dependencies lean.

packages/sample-app/src/sample_security_evaluators.ts (5)

1-20: LGTM: Clear documentation and proper imports.

The documentation clearly explains the security evaluators being demonstrated, and the imports are appropriate for the sample.

34-46: LGTM: Clean OpenAI integration.

The function properly calls the OpenAI API with reasonable parameters and includes a fallback for empty responses. Error handling is appropriately delegated to the caller.

48-63: LGTM: Appropriate task structure.

The task function correctly extracts the query, generates a response, and returns the data in the format expected by security evaluators (text for PII/secrets detection, prompt for injection detection).

105-144: LGTM: Excellent result handling with type guards.

The function properly handles both local and GitHub Actions execution modes using a type guard, with comprehensive logging for task results and errors. The result formatting is clear and helpful for debugging.

146-158: LGTM: Proper entry point with error handling.

The main function follows Node.js best practices with proper async/await error handling and appropriate exit codes for failures.

packages/traceloop-sdk/src/lib/evaluators/index.ts (1)

11-26: Verify that all re-exported symbols are actually exported from the generated files.

The code re-exports symbols from ../generated/evaluators/mbt-evaluators and ../generated/evaluators/registry. Ensure the code generation pipeline produces these files with the expected exports:

EvaluatorMadeByTraceloop, createEvaluator, validateEvaluatorInput, getAvailableEvaluatorSlugs, getEvaluatorSchemaInfo from mbt-evaluators

EVALUATOR_SLUGS, EVALUATOR_SCHEMAS, getEvaluatorSchema, isValidEvaluatorSlug from registry

ellipsis-dev

Important

Looks good to me! 👍

Reviewed ee25650 in 1 minute and 8 seconds. Click for details.

Reviewed 350 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. .prettierignore:3

Draft comment:
Adding '/generated/' to ignore auto‐generated files is appropriate.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/sample-app/src/sample_security_evaluators.ts:78

Draft comment:
The reformatted console.log calls improve consistency and readability without affecting functionality.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/traceloop-sdk/src/lib/evaluators/index.ts:26

Draft comment:
Reformatting the export type block for clarity is fine; no functional change.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. scripts/codegen/generate-evaluator-models.ts:320

Draft comment:
The use of parentheses and trailing commas in lines.push calls enhances readability without altering logic.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. scripts/codegen/generate-evaluator-models.ts:844

Draft comment:
The EvaluatorMadeByTraceloop factory method generation (static methods) is clear; ensure the generated config interfaces align with expected usage.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_ZW7T1fG3dOwY6KBl

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 3ddeb66 in 1 minute and 0 seconds. Click for details.

Reviewed 10 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. .prettierignore:3

Draft comment:
The update now includes a trailing newline at the end of the file for POSIX compliance. This is a best practice.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_VNL5p73oGBOxvBA0

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

scripts/codegen/generate-evaluator-models.ts (3)
1-1: Consider simplifying the shebang.

The shebang #!/usr/bin/env npx ts-node combines multiple tools in an unconventional way. Standard shebangs typically reference a single interpreter. Since the documented usage (line 7) shows npx ts-node as the invocation method, consider either:

Using #!/usr/bin/env ts-node (if ts-node is globally installed)

Removing the shebang entirely (if the script is always invoked via npx ts-node)

844-850: Consider stronger typing for config parameter.

The type assertion config as Record<string, unknown> on line 849 bypasses TypeScript's type checking. While the generated code should have compatible types, consider adding runtime validation or using a type guard to ensure type safety.
Alternative approach with better type safety
     if (configInterface) {
       lines.push(
         `  static ${methodName}(config?: ${configInterface}): EvaluatorWithConfig {`,
       );
       lines.push(
-        `    return createEvaluator('${evaluator.slug}', { config: config as Record<string, unknown> });`,
+        `    return createEvaluator('${evaluator.slug}', config ? { config: { ...config } } : undefined);`,
       );
This approach spreads the config object, ensuring compatibility without explicit type assertions.
930-932: Add error handling for JSON parsing.

The JSON.parse call on line 932 could throw if the swagger file contains invalid JSON. Consider adding a try-catch block with a descriptive error message.
Proposed error handling
   console.log(`=== Reading swagger spec from ${swaggerPath} ===`);
   const swaggerContent = fs.readFileSync(swaggerPath, "utf-8");
-  const spec: OpenAPISpec = JSON.parse(swaggerContent);
+  let spec: OpenAPISpec;
+  try {
+    spec = JSON.parse(swaggerContent);
+  } catch (error) {
+    console.error(`Error: Failed to parse swagger file as JSON: ${error}`);
+    process.exit(1);
+  }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 474c541 and ee25650.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (4)

.prettierignore
packages/sample-app/src/sample_security_evaluators.ts
packages/traceloop-sdk/src/lib/evaluators/index.ts
scripts/codegen/generate-evaluator-models.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/traceloop-sdk/src/lib/evaluators/index.ts

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

scripts/codegen/generate-evaluator-models.ts
packages/sample-app/src/sample_security_evaluators.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them

Applied to files:

packages/sample-app/src/sample_security_evaluators.ts

📚 Learning: 2025-08-24T22:08:07.023Z

Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/*/package.json : Use workspace:* for intra-repo package dependencies in package.json

Applied to files:

.prettierignore

📚 Learning: 2025-07-28T14:30:22.017Z

Learnt from: galkleinman
Repo: traceloop/openllmetry-js PR: 620
File: packages/instrumentation-llamaindex/package.json:43-45
Timestamp: 2025-07-28T14:30:22.017Z
Learning: Lerna automatically converts `workspace:*` dependencies to actual published versions during the `lerna publish` command, so `workspace:*` in package.json files is safe and expected in Lerna-managed monorepos.

Applied to files:

.prettierignore

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (16)

packages/sample-app/src/sample_security_evaluators.ts (4)

13-20: LGTM!

The imports are well-organized and include all necessary types and clients for the security evaluators experiment.

52-63: LGTM!

The task function correctly handles the input, generates a response, and returns the required output structure for security evaluators.

161-172: LGTM!

The main entry point has appropriate error handling for a sample application, catching errors and exiting with a non-zero code on failure.

114-158: No issues found. The TaskResponse type is properly exported from @traceloop/node-server-sdk and correctly imported and used in the code. The type guard and result handling logic are correct.

.prettierignore (1)

2-3: LGTM! Appropriate ignore patterns for generated code.

The addition of **/generated/** correctly excludes auto-generated code from Prettier formatting, aligning with the new code generation workflow introduced in this PR.

scripts/codegen/generate-evaluator-models.ts (11)

16-101: Well-structured type definitions.

The OpenAPI type definitions comprehensively cover both OpenAPI 3.0 and Swagger 2.0 formats (e.g., components.schemas vs definitions, content vs direct schema), providing good compatibility for parsing different spec versions.

103-354: Solid helper function implementations.

The helper functions correctly handle name transformations, schema resolution, and type conversions. The code properly supports both OpenAPI 3.0 and Swagger 2.0 format variations throughout.

359-448: Clean request/response model generation.

The generators properly handle both inline schemas and $ref references, with appropriate deduplication logic to avoid generating the same schema multiple times.

453-573: Comprehensive registry generation with type-safe helpers.

The registry provides centralized access to evaluator metadata, request/response models, and validation utilities. The use of empty objects with type assertions (lines 491, 502) is appropriate for type-level mappings.

578-623: Appropriate config interface generation.

The config interface generation correctly identifies optional fields and only creates interfaces when needed, avoiding unnecessary type definitions for evaluators without configurable parameters.

628-714: Robust evaluator factory with validation.

The createEvaluator function includes proper slug validation, helpful error messages listing available options (lines 684-690), and correctly filters undefined config values (lines 700-706) to avoid sending unnecessary data.

716-771: Comprehensive validation utilities.

The validation logic properly checks for missing, undefined, and null values for required fields (lines 736-741), providing clear error messages for debugging.

773-858: Well-designed factory class API.

The EvaluatorMadeByTraceloop class provides an intuitive, type-safe interface for creating evaluators. The conditional config parameters (lines 844-854) appropriately distinguish between evaluators with and without configurable options.

866-904: Clean barrel export organization.

The index file appropriately re-exports all generated types and utilities, providing a convenient single entry point for consumers of the generated code.

909-991: Well-structured main function with helpful diagnostics.

The main function provides clear usage instructions, validates inputs, and offers informative logging. The diagnostic output when no evaluators are found (lines 938-946) is particularly helpful for debugging spec issues.

638-638: The relative import path is correct for the expected output directory structure.

The generated files are placed in packages/traceloop-sdk/src/lib/generated/evaluators/, and the import statement '../../interfaces/experiment.interface' correctly resolves to packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts. The path assumes the proper two-level depth from the interfaces directory, which matches the actual output configuration in scripts/generate-models.sh.

packages/sample-app/src/sample_security_experiment.ts

package.json

ellipsis-dev

Caution

Changes requested ❌

Reviewed 70fd2db in 2 minutes and 19 seconds. Click for details.

Reviewed 1659 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:222

Draft comment:
Repeated cast of config via 'as unknown as Record<string, unknown>' is used in several methods. Consider extracting a helper function to normalize config objects for cleaner code.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a code quality refactoring suggestion, which the rules say are good if they are actionable and clear. The suggestion is clear and would improve code quality. However, the file header explicitly states "Auto-generated - DO NOT EDIT". This means any manual changes would be overwritten when the file is regenerated. For this suggestion to be actionable, it would need to be implemented in the generator script (generate-evaluator-models.ts), not in this file. Since the comment is on the generated file itself and not on the generator, it's not actionable in its current form. The PR author cannot act on this comment by modifying this file. Perhaps the comment is still valuable as feedback that could lead to improving the generator script. Even if it can't be acted upon directly in this file, it might prompt the team to update the code generation logic. The comment doesn't explicitly acknowledge that this is auto-generated code, which might mean the reviewer didn't notice. While the feedback could theoretically be valuable, the rules state comments should be actionable. A comment on an auto-generated file asking for manual refactoring is not actionable. If the reviewer wanted to suggest improving the generator, they should comment on the generator script itself, not the generated output. The comment as written cannot be acted upon by the PR author. This comment should be deleted. It suggests a refactoring on an auto-generated file that explicitly states "DO NOT EDIT". The comment is not actionable because any manual changes would be overwritten when the file is regenerated. For this to be useful, it would need to be directed at the generator script, not the generated output.

2. scripts/codegen/generate-evaluator-models.ts:329

Draft comment:
In extractFieldsFromSchema, all keys from the nested 'input' property are pushed without checking the 'required' flags. Verify if some input fields could be optional and adjust accordingly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid concern. The function is called extractFieldsFromSchema and returns requiredInputFields, but the implementation at line 336 pushes ALL property keys without checking if they're in the inputSchema.required array. This could be a bug if some input fields are actually optional. However, I need to consider: 1) Maybe the naming is intentional and all input fields are meant to be treated as required regardless of the schema? 2) The comment says "Verify if..." which is asking the author to check/confirm, which violates the rules about not asking authors to verify or confirm things. 3) Looking at the config extraction (lines 340-348), it similarly pushes all config property keys to optionalConfigFields without checking required flags, suggesting this might be intentional design where inputs are assumed required and configs are assumed optional. The comment uses "Verify if..." language which is explicitly against the rules - we shouldn't ask authors to verify or confirm. Also, without seeing the actual swagger schema structure, I can't be certain this is a bug. It's possible the design intentionally treats all input fields as required and all config fields as optional, which would make the current implementation correct. While the implementation does seem inconsistent with typical schema handling (where you'd check the required array), the comment violates the rule about not asking authors to verify things. Additionally, the parallel structure with config fields (which are all treated as optional) suggests this might be intentional design. Without strong evidence this is definitely wrong, I should delete the comment. The comment should be deleted because: 1) It uses "Verify if..." language which explicitly violates the rules about not asking authors to confirm or verify, and 2) There's not strong enough evidence this is definitely a bug - it could be intentional design where all input fields are treated as required by convention.

3. scripts/codegen/generate-evaluator-models.ts:341

Draft comment:
The extraction for 'config' fields similarly collects all keys. It may be beneficial to check the schema’s 'required' array if a distinction between required and optional config fields is needed.
Reason this comment was not posted:
Comment looked like it was already resolved.

4. packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts:262

Draft comment:
Using 'slug in EVALUATOR_SCHEMAS' to check for valid slug works in this context, but consider using Object.hasOwnProperty for robustness if prototype properties might interfere.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50% None

5. scripts/codegen/generate-evaluator-models.ts:510

Draft comment:
Overall, the generator updates (e.g. for openApiTypeToTs and generateIndexFile) are well‐structured and consistently apply the new schema (with 'input.' and 'config.' prefixes). No major issues found.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:339

Draft comment:
Typo check: The default model is documented as 'gpt-4o'. Please confirm if this is correct or if it should be 'gpt-4'.
Reason this comment was not posted:
Comment looked like it was already resolved.

7. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:353

Draft comment:
Typo check: 'gpt-4o' appears as the default model. Verify if this is intended or if it should be corrected to 'gpt-4'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% 'gpt-4o' is actually a real OpenAI model (GPT-4 Omni), released in 2024. It's not a typo. The comment is incorrect because it assumes 'gpt-4o' is a mistake when it's actually a valid model identifier. Additionally, this file is auto-generated (as stated in the header comments), so even if there were an issue, it should be fixed in the source swagger.json file, not here. The comment violates the rule about not commenting on things that are clearly intentional, and it's making an incorrect assumption. Could 'gpt-4o' be an internal or custom model name specific to this codebase that differs from OpenAI's naming? Perhaps the swagger.json source has a typo that propagated here, making this a legitimate concern. Even if there were uncertainty about the model name, this is auto-generated code from swagger.json. The comment asks to "verify" which violates the rules against asking the PR author to confirm intentions. Additionally, 'gpt-4o' is a well-known OpenAI model, so this is almost certainly not a typo. This comment should be deleted. 'gpt-4o' is a legitimate OpenAI model name (GPT-4 Omni), not a typo. The comment incorrectly assumes it should be 'gpt-4' and asks for verification, which violates the rules. Additionally, this is auto-generated code, so any changes would need to be made to the source file.

8. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:394

Draft comment:
Typographical error detected: the default model is specified as 'gpt-4o'. If this is unintended, consider changing it to 'gpt-4'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This file is auto-generated from a swagger.json file, as stated in the header comments. The comment is suggesting that "gpt-4o" might be a typo for "gpt-4". However, "gpt-4o" is actually a real OpenAI model (GPT-4 Omni), not a typo. The comment is speculative ("If this is unintended") and is about documentation that was intentionally added in this PR. Since this is auto-generated code, any fix would need to happen in the source swagger.json file, not here. The comment violates the rule about not making speculative comments ("If X, then Y is an issue"). Additionally, gpt-4o is a legitimate model name, so this isn't even a real issue. I might be wrong about gpt-4o being a real model - perhaps it's a typo in the source data. However, the fact that it appears consistently in multiple places (3 times in the diff) suggests it's intentional, not a typo. Even if I'm uncertain about whether gpt-4o is correct, the comment is speculative ("If this is unintended") and this is auto-generated code where changes can't be made directly in this file. The comment doesn't provide strong evidence of an actual problem, and gpt-4o is indeed a real OpenAI model (GPT-4 Omni). This comment should be deleted. It's speculative, gpt-4o is a legitimate model name (GPT-4 Omni), and the file is auto-generated so any changes would need to be made in the source swagger.json file, not here.

Workflow ID: wflow_wtjPoqB1nr7nOQBH

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

scripts/codegen/generate-evaluator-models.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

scripts/codegen/generate-evaluator-models.ts (3)
113-118: Remove unused function slugToMethodName.

This function is defined but never called anywhere in the file. The code uses slugToCamelCase (line 442) for generating method names instead.
🔎 Proposed fix
-/**
- * Convert slug like "pii-detector" to snake_case method name like "pii_detector"
- */
-function slugToMethodName(slug: string): string {
-  return slug.replace(/-/g, "_");
-}
-
563-566: Consider documenting the 'latest' version default.

The default version 'latest' on line 565 may have reproducibility implications if evaluator behavior changes between versions. This is fine for a default, but consider whether the generated JSDoc should mention this behavior to help users understand they can pin specific versions.

788-790: Consider adding JSON parse error handling.

If the swagger file contains malformed JSON, JSON.parse will throw an exception with a potentially cryptic error message. Consider wrapping this in a try-catch for a more user-friendly error.
🔎 Proposed improvement
   console.log(`=== Reading swagger spec from ${swaggerPath} ===`);
   const swaggerContent = fs.readFileSync(swaggerPath, "utf-8");
-  const spec: OpenAPISpec = JSON.parse(swaggerContent);
+  let spec: OpenAPISpec;
+  try {
+    spec = JSON.parse(swaggerContent);
+  } catch (error) {
+    console.error(`Error: Failed to parse swagger file as JSON: ${(error as Error).message}`);
+    process.exit(1);
+  }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3ddeb66 and 70fd2db.

⛔ Files ignored due to path filters (3)

packages/traceloop-sdk/src/lib/generated/evaluators/index.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts is excluded by !**/generated/**
packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts is excluded by !**/generated/**

📒 Files selected for processing (1)

scripts/codegen/generate-evaluator-models.ts

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/codegen/generate-evaluator-models.ts (1)

packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts (1)

EvaluatorSchema (6-11)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (11)

scripts/codegen/generate-evaluator-models.ts (11)
1-14: LGTM!

The shebang is correct for ts-node execution, and the usage documentation is clear and helpful.

16-101: LGTM!

The type definitions comprehensively cover both OpenAPI 3.0 and Swagger 2.0 formats. The EvaluatorSchema interface correctly matches the generated output structure.

131-147: LGTM!

Clean implementation that correctly handles both OpenAPI 3.0 (components/schemas) and Swagger 2.0 (definitions) formats.

149-230: LGTM!

The function correctly handles both OpenAPI 3.0 and Swagger 2.0 formats for request bodies and responses. The defensive check on line 224 ensures only valid evaluators with schemas are included.

232-313: LGTM!

The type conversion logic is comprehensive, handling $ref, allOf, oneOf, anyOf, enums, and various primitive/complex types. The recursive resolution with configurable ref inlining is well-designed.

329-338: Verify: All input properties are treated as required.

The function extracts all properties from the input schema as requiredInputFields, ignoring the schema's required array. This means even optional fields within input will be listed as required.

If this is intentional (i.e., all input fields must be provided regardless of schema definition), consider adding a clarifying comment. Otherwise, you may want to filter based on inputSchema.required.
🔎 Alternative implementation if filtering is needed
   if (inputSchema?.properties) {
-    requiredInputFields.push(...Object.keys(inputSchema.properties));
+    // Only include fields marked as required in the schema
+    const required = inputSchema.required || [];
+    requiredInputFields.push(
+      ...Object.keys(inputSchema.properties).filter((key) => required.includes(key))
+    );
   }
354-437: LGTM!

The registry file generation is well-structured with proper type definitions, constants, and helper functions. The fallback to string on line 380 for empty evaluator lists is good defensive coding.

439-492: LGTM!

The config interface generation is clean and handles edge cases well. The JSDoc comment generation from schema descriptions enhances developer experience.

641-727: LGTM!

The EvaluatorMadeByTraceloop class generation provides an excellent developer experience with type-safe factory methods and comprehensive JSDoc documentation including required input fields.

731-762: LGTM!

Standard barrel file generation with clean separation of type and value exports.

764-841: LGTM!

The main function has proper argument validation, file existence checks, helpful diagnostic output when no evaluators are found, and comprehensive logging. The use of recursive: true for directory creation is good practice.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 1ef6c7a in 45 seconds. Click for details.

Reviewed 65 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. scripts/codegen/generate-evaluator-models.ts:393

Draft comment:
Formatting improvement: The ternary assignment for extracting requiredInputFields and optionalConfigFields was reformatted for better readability. No behavioral change.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. scripts/codegen/generate-evaluator-models.ts:468

Draft comment:
The if-check for configSchema properties and the mapping callback in generateConfigInterface have been reformatted to improve readability. No functional changes detected.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. scripts/codegen/generate-evaluator-models.ts:704

Draft comment:
Reformatted the template literal push in generateMbtEvaluatorsFile for consistency. This change is purely stylistic.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_rVCQk1cNudqeV4pT

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 74a1953 in 56 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 1 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. package.json:12

Draft comment:
Verify that bumping '@types/node' to ^24.0.15 is fully compatible with your Node runtime.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_MrhFjAttGrRuHo3t

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

scripts/codegen/generate-evaluator-models.ts (1)
700-716: Terminology inconsistency in JSDoc comment.

Line 714 documents "Required task output fields" but should say "Required task input fields" to match the variable name requiredInputFields and be consistent with the rest of the codebase.
🔎 Proposed fix
-      lines.push(
-        `   * Required task output fields: ${requiredInputFields.join(", ")}`,
-      );
+      lines.push(
+        `   * Required task input fields: ${requiredInputFields.join(", ")}`,
+      );

🧹 Nitpick comments (3)

scripts/codegen/generate-evaluator-models.ts (3)
796-798: Add error handling for JSON parsing.

Line 798 parses the swagger file without a try-catch block. If the file contains invalid JSON, it will throw an unhelpful error. Adding proper error handling would improve the developer experience.
🔎 Proposed fix
   console.log(`=== Reading swagger spec from ${swaggerPath} ===`);
   const swaggerContent = fs.readFileSync(swaggerPath, "utf-8");
-  const spec: OpenAPISpec = JSON.parse(swaggerContent);
+  let spec: OpenAPISpec;
+  try {
+    spec = JSON.parse(swaggerContent);
+  } catch (error) {
+    console.error(`Error: Failed to parse swagger file as JSON: ${error}`);
+    process.exit(1);
+  }
808-811: Document the limit in diagnostic output.

Line 809 only shows the first 20 paths when no evaluator endpoints are found, but this limitation isn't mentioned in the output. Users might think that's the complete list.
🔎 Proposed fix
-    console.log("Available paths:");
+    console.log("Available paths (first 20):");
     for (const pathUrl of Object.keys(spec.paths).slice(0, 20)) {
       console.log(`  ${pathUrl}`);
     }
775-786: Consider validating command-line arguments.

The script reads file paths from command-line arguments without validation. While this is a developer tool, adding basic path validation (e.g., checking for path traversal, ensuring paths are within expected directories) would improve safety.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 70fd2db and 1ef6c7a.

📒 Files selected for processing (1)

scripts/codegen/generate-evaluator-models.ts

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/codegen/generate-evaluator-models.ts (3)

packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts (1)

EvaluatorSchema (6-11)

packages/traceloop-sdk/src/lib/generated/evaluators/index.ts (1)

EvaluatorSchema (14-14)

packages/traceloop-sdk/src/lib/evaluators/index.ts (1)

EvaluatorSchema (28-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (3)

scripts/codegen/generate-evaluator-models.ts (3)

149-230: LGTM!

The evaluator extraction logic correctly handles both OpenAPI 3.0 and Swagger 2.0 formats, properly resolves schema references, and filters endpoints by the expected path pattern.

720-726: Check if the type assertion as unknown as Record<string, unknown> is necessary.

Line 725 uses a double type cast that bypasses TypeScript's type checking. If configInterface is already a compatible type, this assertion is unnecessary and reduces type safety. Verify the signature of createEvaluator to determine whether the parameter type actually requires this cast or if the function signature should be adjusted to accept the typed config directly.

236-313: Consider handling circular references in schema resolution.

The openApiTypeToTs function recursively resolves $ref references when resolveRefs is true. If the OpenAPI spec contains circular schema references (schema A references B, B references A), this could cause infinite recursion and stack overflow.

Consider adding a visited Set to track schemas being resolved to prevent infinite recursion on circular references.

scripts/codegen/generate-evaluator-models.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

package.json (1)
12-14: DevDependencies look reasonable; consider updating openapi-typescript.

@types/node at version ^24.0.15 addresses the past review comment requesting version 24, though the latest available is 25.0.3. Using v24 is appropriate if targeting Node.js 24.

ts-node ^10.9.2 matches the latest version and is suitable for this use case.

openapi-typescript is at 7.4.0 while the latest is 7.10.1. Consider updating to the latest 7.x version for improvements and bug fixes.
🔎 Optional update to latest openapi-typescript
-    "openapi-typescript": "^7.4.0",
+    "openapi-typescript": "^7.10.1",

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1ef6c7a and 74a1953.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (1)

package.json

🔇 Additional comments (1)

package.json (1)

8-8: LGTM! Script addition looks appropriate.

The new generate:evaluator-models script follows npm conventions and delegates to a shell script, which is a reasonable approach for orchestrating the OpenAPI code generation workflow.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed dc628d0 in 36 seconds. Click for details.

Reviewed 42 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. scripts/codegen/generate-evaluator-models.ts:106

Draft comment:
Removed slugToMethodName; ensure no downstream code relies on the snake_case format.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. scripts/codegen/generate-evaluator-models.ts:329

Draft comment:
Now only required input properties are added; verify that absence of a required array doesn't omit needed fields.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to verify something, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code. Therefore, it should be removed.

Workflow ID: wflow_JOozvA5xlbkA9M8N

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

doronkopit5 · 2025-12-31T11:37:07Z

packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts

    const inputSchemaMapping = this.createInputSchemaMapping(taskResult);

-    const payload = {
+    const payload: Record<string, unknown> = {


lets make this a type, i guess we know the expirement payload

doronkopit5 · 2025-12-31T11:38:26Z

packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts

+  probability_threshold?: number;
+}
+
+export interface PlaceholderRegexConfig {


i guess there is a bug in the swagger, but placeholder has more params

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 4f103c6 in 2 minutes and 11 seconds. Click for details.

Reviewed 653 lines of code in 2 files
Skipped 1 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. package.json:8

Draft comment:
Verify the 'generate:evaluator-models' script: it still points to a shell script ('./scripts/generate-models.sh') while the TS entrypoint exists. Ensure this is intended.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. scripts/codegen/generate-evaluator-models.ts:85

Draft comment:
Regex extraction for 'TempType' may be brittle if the output format changes. Consider a more robust extraction method.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is speculative in nature - it says the regex "may be brittle" without demonstrating an actual failure case. The code has a fallback (returns "unknown" if the match fails), so it handles the case where the regex doesn't match. The comment doesn't provide specific examples of when the output format would change or what a "more robust extraction method" would look like. According to the rules, I should not keep speculative comments like "If X, then Y is an issue" - I should only comment if it's definitely an issue. This falls into that category of speculation without strong evidence of a problem. The regex might actually be fragile if the openapi-typescript library changes its output format in future versions. The comment could be pointing to a legitimate maintenance concern, even if it's not currently broken. The code is using an external library's string output format as an API, which is generally not ideal. While the concern about brittleness has some merit, the comment doesn't provide concrete evidence that this is actually a problem or suggest a specific alternative solution. The code has error handling (fallback to "unknown"), and without knowing the actual stability guarantees of openapi-typescript's output format, this is purely speculative. The rules explicitly state not to make speculative comments. This comment should be deleted because it's speculative ("may be brittle") without providing concrete evidence of a problem or a clear, actionable alternative. The code has appropriate error handling, and the comment doesn't demonstrate when or how the regex would actually fail.

3. scripts/codegen/generate-evaluator-models.ts:1

Draft comment:
The shebang ('#!/usr/bin/env npx ts-node') may be non-portable; verify that all target environments support this invocation.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. scripts/codegen/generate-evaluator-models.ts:317

Draft comment:
The generateMbtEvaluatorsFile function is lengthy and complex; consider refactoring it into smaller helper functions for better maintainability.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_UAVmCfBEzsyJahZ8

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

scripts/codegen/generate-evaluator-models.ts (1)
520-528: Fix inconsistent terminology in documentation comment.

Line 527 says "Required task output fields" but should say "Required task input fields" to be consistent with the actual semantics. These fields represent required inputs to the evaluator, not outputs.
🔎 Proposed fix
-      lines.push(
-        `   * Required task output fields: ${requiredInputFields.join(", ")}`,
-      );
+      lines.push(
+        `   * Required task input fields: ${requiredInputFields.join(", ")}`,
+      );

🧹 Nitpick comments (2)

scripts/codegen/generate-evaluator-models.ts (2)
65-92: Regex-based type extraction may be fragile.

The regex pattern at line 85 (/TempType:\s*([^;]+);/) assumes a specific output format from openapi-typescript. If that library changes its output format in future versions, this extraction will break silently and fall back to unknown.

Consider adding error logging or a warning when the regex match fails (before the fallback) to help diagnose issues during code generation.
🔎 Suggested improvement
   // Extract just the TempType definition
   const match = output.match(/TempType:\s*([^;]+);/);
   if (match) {
     return match[1].trim();
   }
 
+  // Log warning when extraction fails
+  console.warn(`Could not extract TypeScript type for schema, falling back to 'unknown'. Output was: ${output.substring(0, 100)}...`);
+
   // Fallback for simple types
   return "unknown";
387-393: Consider whether null config values should also be filtered.

The code filters undefined values from the config object but preserves null values. Depending on the API contract, null might also need to be filtered to avoid sending explicit null values to the evaluator service.
🔎 Potential enhancement if null filtering is needed
   if (options?.config) {
     result.config = Object.fromEntries(
-      Object.entries(options.config).filter(([, v]) => v !== undefined),
+      Object.entries(options.config).filter(([, v]) => v !== undefined && v !== null),
     );
   }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc628d0 and 4f103c6.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (2)

package.json
scripts/codegen/generate-evaluator-models.ts

🚧 Files skipped from review as they are similar to previous changes (1)

package.json

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/codegen/generate-evaluator-models.ts (1)

packages/traceloop-sdk/src/lib/generated/evaluators/registry.ts (1)

EvaluatorSchema (6-11)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (8)

scripts/codegen/generate-evaluator-models.ts (8)

37-59: LGTM: String conversion utilities are well-implemented.

The helper functions correctly convert kebab-case slugs to PascalCase class names and camelCase method names.

94-150: LGTM: Dual OpenAPI version support is correctly implemented.

The function properly handles both OpenAPI 3.0 (requestBody) and Swagger 2.0 (parameters with in:body) formats with appropriate fallback logic.

152-186: LGTM: Field extraction correctly respects schema's required array.

The function now properly filters input fields based on the schema's required array (lines 169-174), addressing the critical issue from previous reviews. Config fields are appropriately treated as optional.

188-269: LGTM: Registry generation is comprehensive and type-safe.

The generated registry provides a complete type-safe API with interfaces, type unions, constants, and helper functions. The structure matches the expected output.

271-312: LGTM: Config interface generation respects optional/required fields.

The function correctly checks the required array (line 298) to determine whether config properties should be optional, and includes helpful JSDoc comments from schema descriptions.

533-539: Note: Type cast sacrifices type safety but is necessary.

Line 538 casts the typed config to Record<string, unknown> to satisfy the generic createEvaluator signature. This loses compile-time type checking for the config parameter, but it's a necessary trade-off given the dynamic nature of the factory method generation.

An alternative would be to generate overloaded signatures for createEvaluator that preserve type safety per evaluator, but this would significantly complicate the code generation logic.

552-583: LGTM: Index file generation properly re-exports all public symbols.

The generated index file cleanly re-exports types and values from the registry and mbt-evaluators modules.

585-662: LGTM: CLI implementation is robust with good error handling.

The main function provides clear usage instructions, validates inputs, handles errors gracefully, and provides helpful diagnostic output when no evaluators are found.

…ation and streamline evaluator utilities

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 6c7c37f in 2 minutes and 6 seconds. Click for details.

Reviewed 2155 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/src/lib/client/experiment/experiment.ts:325

Draft comment:
Removed evaluator version validation – confirm that making version optional is intentional and document the change.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:128

Draft comment:
Factory methods now consistently cast config as Record<string, unknown>; verify that removal of a default version (previously 'latest') is intended.
Reason this comment was not posted:
Comment was on unchanged code.

3. scripts/codegen/generate-evaluator-models.ts:80

Draft comment:
Evaluator schema extraction robustly handles both OpenAPI 3.0 and Swagger 2.0; ensure all edge cases ($ref handling) are covered.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that all edge cases are covered, which violates the rule against asking the author to ensure behavior is intended or tested. It does not provide a specific suggestion or point out a specific issue.

4. scripts/generate-models.sh:36

Draft comment:
Shell script correctly converts Swagger 2.0 to OpenAPI 3.0 and cleans up temporary files – good use of mktemp and removal steps.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, praising the use of mktemp and cleanup steps. It doesn't provide any actionable feedback or suggestions for improvement.

5. packages/traceloop-sdk/src/lib/generated/evaluators/index.ts:22

Draft comment:
Typographical issue: The syntax "export type * from './mbt-evaluators';" does not appear to be valid TypeScript. Consider using a valid re-export syntax—for example, if you intended to re-export all types, you might need to list them explicitly or use an alternative approach.
Reason this comment was not posted:
Comment looked like it was already resolved.

6. packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts:288

Draft comment:
The default value for config.model is specified as 'gpt-4o'. Please confirm if this is intentional or if it's a typographical error meant to be 'gpt-4'.
Reason this comment was not posted:
Comment was on unchanged code.

7. packages/traceloop-sdk/src/lib/generated/evaluators/types.ts:126

Draft comment:
Typo alert: The example value 'gpt-4o' might be a typo. Should it be 'gpt-4'?
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This comment is problematic for multiple reasons: 1) The file is auto-generated, so the comment is asking the author to change something they shouldn't manually edit. 2) 'gpt-4o' is actually a legitimate model name (GPT-4 Omni) from OpenAI, not a typo. 3) The value appears consistently three times in the file, which further suggests it's intentional. 4) The comment is speculative ("might be a typo") rather than definitively identifying an issue. This violates the rule about not making speculative comments. Could 'gpt-4o' be an internal naming convention or a placeholder that the team uses? Maybe I'm wrong about it being a real model name, though I'm fairly confident GPT-4o (Omni) is real. Even if I were uncertain about whether 'gpt-4o' is a real model, the fact that this is an auto-generated file (explicitly stated at the top) means the comment is asking for a change that shouldn't be made manually. The comment should be deleted regardless. This comment should be deleted. The file is auto-generated and shouldn't be manually edited, and 'gpt-4o' is actually a legitimate OpenAI model name (GPT-4 Omni), not a typo. The comment is both incorrect and violates the rule about not commenting on auto-generated files.

8. packages/traceloop-sdk/src/lib/generated/evaluators/types.ts:140

Draft comment:
Typo alert: The example value 'gpt-4o' appears here as well. Consider confirming if this should be 'gpt-4'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This comment has multiple issues: 1) The file is auto-generated, as clearly stated in the header, so suggesting changes to it violates the principle of not commenting on auto-generated code. 2) 'gpt-4o' is not a typo - it's a legitimate OpenAI model name (GPT-4 Omni). 3) The comment is speculative ("Consider confirming") rather than definitively identifying an issue. 4) Even if it were a typo, the fix would need to be made in the source OpenAPI specification, not in this generated file. This comment should definitely be deleted. Could 'gpt-4o' be an internal model name or variant that I'm not aware of? Perhaps the comment author has domain knowledge that this should indeed be 'gpt-4' in this specific context? Even if there were uncertainty about the correct model name, the file is explicitly marked as auto-generated and warns against direct changes. Any correction would need to happen at the source. The comment is not actionable in this context and violates the rule about auto-generated files. This comment should be deleted. The file is auto-generated (as stated in the header), so direct changes should not be made to it. Additionally, 'gpt-4o' is a legitimate model name, not a typo. The comment is speculative and not actionable.

Workflow ID: wflow_yaIqXsko1GLjxMLA

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed fa1018f in 1 minute and 0 seconds. Click for details.

Reviewed 263 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. scripts/codegen/generate-evaluator-models.ts:68

Draft comment:
Nice reformatting: Splitting the union type cast for rawPathItemObj into multiple lines improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. scripts/codegen/generate-evaluator-models.ts:74

Draft comment:
Consistent multi-line union type casting for derefPathItem enhances clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. scripts/codegen/generate-evaluator-models.ts:86

Draft comment:
The multi-line formatting for the rawRequestBody and rawSchema casts (lines 83–89) improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. scripts/codegen/generate-evaluator-models.ts:105

Draft comment:
Refactoring the arrow function callback for finding body parameters in the Swagger 2.0 fallback improves consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. scripts/codegen/generate-evaluator-models.ts:145

Draft comment:
Reformatting the extraction of inputSchema and configSchema (lines 148–163) makes it more maintainable.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. scripts/codegen/generate-evaluator-models.ts:231

Draft comment:
The multi-line formatting in generateMbtEvaluatorsFile (around the factory methods) enhances clarity in the generated evaluator factory.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

7. scripts/codegen/generate-evaluator-models.ts:580

Draft comment:
Improved multi-line formatting for fs.writeFileSync calls in the main function increases overall code readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_x3iIsHJme2VdL6jr

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nina-kollman · 2025-12-31T14:04:54Z

packages/sample-app/src/sample_security_experiment.ts

This file runs an experiment - lets rename the file and the run command accordingly
"run:security_evaluators": "npm run build && node

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

scripts/codegen/generate-evaluator-models.ts (1)
237-289: Fix documentation inconsistency.

Line 268 says "Required task output fields" but should say "Required task input fields" to match the actual variable being documented (requiredInputFields from line 261).
🔎 Proposed fix
       const reqFieldsDoc =
         requiredInputFields.length > 0
-          ? `\n   * Required task output fields: ${requiredInputFields.join(", ")}`
+          ? `\n   * Required task input fields: ${requiredInputFields.join(", ")}`
           : "";

🧹 Nitpick comments (2)

scripts/codegen/generate-evaluator-models.ts (2)
534-608: Consider adding more granular error handling.

The main function has basic error handling, but individual file operations and schema extractions could benefit from more specific error handling and validation.

For a development tool, the current level is acceptable, but consider adding try-catch blocks around file writes and validating the parsed spec structure to provide clearer error messages during generation.
💡 Example improvements
// Validate output directory is writable
try {
  fs.accessSync(outputDir, fs.constants.W_OK);
} catch {
  console.error(`Error: Output directory ${outputDir} is not writable`);
  process.exit(1);
}

// Wrap file writes with error handling
try {
  fs.writeFileSync(path.join(outputDir, "registry.ts"), generateRegistryFile(evaluators));
  console.log(`  - registry.ts`);
} catch (err) {
  console.error(`Failed to write registry.ts:`, err);
  process.exit(1);
}
141-168: Config field requirement handling is inconsistent, but type-safe.

The extractFieldsFromSchema function treats all config fields as optional (line 164) without checking configSchema.required. However, this is partially mitigated by the fact that config types are imported from the OpenAPI spec and enforce requirements at the TypeScript type level (e.g., PiiDetectorConfig = components['schemas']['...']['config']).

The inconsistency is that required input fields are explicitly tracked in the registry and validated at runtime via validateEvaluatorInput, while required config fields are only enforced by the type system—not tracked in the registry or validated at runtime. This makes the behavior less transparent and prevents runtime validation of config field requirements.

Consider populating requiredConfigFields alongside optionalConfigFields to maintain consistency with input field handling and enable runtime validation similar to how validateEvaluatorInput works for input fields.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7c37f and fa1018f.

📒 Files selected for processing (1)

scripts/codegen/generate-evaluator-models.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (9)

scripts/codegen/generate-evaluator-models.ts (9)

1-15: LGTM!

The shebang, imports, and documentation are appropriate for a TypeScript code generation script.

17-49: LGTM!

The utility functions correctly handle slug transformations and reference extraction for both OpenAPI 3.0 and Swagger 2.0 formats.

170-177: LGTM!

The hasConfigFields function correctly checks for the presence of config properties.

183-235: LGTM!

The generateRegistryFile function correctly generates type-safe registry exports with evaluator schemas and utility functions.

305-353: LGTM!

The utility functions (createEvaluator, validateEvaluatorInput, getAvailableEvaluatorSlugs, getEvaluatorSchemaInfo) provide a clean API for working with evaluators.

359-386: LGTM!

The EvaluatorMadeByTraceloop factory class provides a clean, type-safe interface with helpful documentation and examples.

389-413: LGTM!

The index file correctly re-exports all public APIs and types.

419-479: LGTM!

The collectReferencedSchemas function correctly handles recursive schema traversal with proper cycle prevention using a Set.

481-528: LGTM!

The filtered spec generation correctly creates a minimal OpenAPI 3.0 document containing only the schemas needed for evaluators, which is appropriate for consumption by openapi-typescript.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed aa4393e in 27 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/package.json:43

Draft comment:
Renamed script command for security evaluator. Ensure references and documentation are updated accordingly.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_SK4tvgo7d5buD45r

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

feat(made-by-traceloop): add made by traceloop evals

474c541

ellipsis-dev bot reviewed Dec 30, 2025

View reviewed changes

fx

b664503

ellipsis-dev bot reviewed Dec 30, 2025

View reviewed changes

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

fx

ee25650

ellipsis-dev bot reviewed Dec 30, 2025

View reviewed changes

fx

3ddeb66

ellipsis-dev bot reviewed Dec 30, 2025

View reviewed changes

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

packages/sample-app/src/sample_security_experiment.ts Show resolved Hide resolved

packages/sample-app/src/sample_security_experiment.ts Show resolved Hide resolved

packages/sample-app/src/sample_security_experiment.ts Show resolved Hide resolved

doronkopit5 reviewed Dec 31, 2025

View reviewed changes

package.json Outdated Show resolved Hide resolved

fx

70fd2db

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

scripts/codegen/generate-evaluator-models.ts Outdated Show resolved Hide resolved

prettier

1ef6c7a

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

fx

74a1953

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

scripts/codegen/generate-evaluator-models.ts Outdated Show resolved Hide resolved

scripts/codegen/generate-evaluator-models.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

fx

dc628d0

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

doronkopit5 changed the title ~~feat(MadeByTraceloop): add made by Traceloop evals~~ reafctor(traceloop-sdk): add made by Traceloop evals Dec 31, 2025

doronkopit5 changed the title ~~reafctor(traceloop-sdk): add made by Traceloop evals~~ refactor(traceloop-sdk): add made by Traceloop evals Dec 31, 2025

doronkopit5 reviewed Dec 31, 2025

View reviewed changes

fx

4f103c6

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

refactor: consolidate evaluator configuration types via OpenAPI gener…

6c7c37f

…ation and streamline evaluator utilities

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

fx

fa1018f

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

nina-kollman reviewed Dec 31, 2025

View reviewed changes

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

fx

aa4393e

ellipsis-dev bot reviewed Dec 31, 2025

View reviewed changes

nina-kollman approved these changes Dec 31, 2025

View reviewed changes

refactor(traceloop-sdk): add made by Traceloop evals #853

Are you sure you want to change the base?

refactor(traceloop-sdk): add made by Traceloop evals #853

Uh oh!

Conversation

galzilber commented Dec 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

doronkopit5 Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

doronkopit5 Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

galzilber commented Dec 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 30, 2025 •

edited

Loading