-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add warning about OpenAI models + dict typed tools #3712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add warning about OpenAI models + dict typed tools #3712
Conversation
|
|
||
| def _map_tool_definition(self, f: ToolDefinition) -> responses.FunctionToolParam: | ||
| if _has_dict_typed_params(f.parameters_json_schema): | ||
| warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the warning into the helper method so that we don't repeat the text
| tools.append({'type': 'image_generation'}) | ||
| return tools | ||
|
|
||
| def _map_tool_definition(self, f: ToolDefinition) -> responses.FunctionToolParam: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll also need to do the check in _map_json_schema
tests/models/test_openai.py
Outdated
| 'properties': { | ||
| 'dict_list': {'type': 'array', 'items': {'type': 'object', 'additionalProperties': {'type': 'integer'}}} | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to test this with a tool function that actually has a dict argument, or a BaseModel argument that itself has a dict field, instead of testing the schemas directly.
So maybe can we build an agent with a tool like that, then run it, and test that a warning was emitted?
Co-authored-by: Douwe Maan <me@douwe.me>
|
Ok @DouweM I think this is ready for another look. |
|
@hinnefe2 Please have a look at the coverage check! |
| ) | ||
|
|
||
| def _map_json_schema(self, o: OutputObjectDefinition) -> chat.completion_create_params.ResponseFormat: | ||
| _warn_on_dict_typed_params(o.name or DEFAULT_OUTPUT_TOOL_NAME, o.json_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is also used by OpenRouterModel and tons of other OpenAI-compatible APIs (https://ai.pydantic.dev/models/openai/#openai-compatible-models) so I'm thinking we should do this only when self.system == 'openai'
| for prop_schema in properties.values(): | ||
| # Check for object type with non False/absent additionalProperties | ||
| if (prop_schema.get('type') == 'object') and (prop_schema.get('additionalProperties') not in (False, None)): | ||
| has_dict_params = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this accidentally catch things like class Foo(BaseModel, extra='allow'), or a TypedDict with total=False (or the other properties, I may be mixing them up)? My point is that maybe we should verify that properties is empty, assuming that OpenAI only ignores "empty" object schemas with only additionalProperties, but not those with at least one known field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that OpenAI only ignores "empty" object schemas with only additionalProperties, but not those with at least one known field.
Ok, after investigating a bit more I think the OpenAI behavior depends on the presence of a properties key, not the additionalProperties. I repeated the test where I pass a tool schema to an OpenAI request as a) part of a tool definition, and b) part of the regular prompt and ask the model to compare. For the new test I used the following schema which has all four combinations of additionalProperties (True or False) and properties (present or absent):
"additionalProperties": false,
"properties": {
"name": {
"description": "A name string",
"type": "string"
},
"data1": {
"description": "Some arbitrary data",
"type": "object",
"additionalProperties": false
},
"data2": {
"description": "Some more arbitrary data",
"type": "object",
"additionalProperties": true
},
"data3": {
"description": "Even more arbitrary data",
"type": "object",
"properties": {},
"additionalProperties": false
},
"data4": {
"description": "Even even more arbitrary data",
"type": "object",
"properties": {},
"additionalProperties": true
},
"config": {
"description": "A configuration object with defined properties",
"type": "object",
"properties": {
"timeout": {
"description": "Timeout in seconds",
"type": "integer"
},
"config_data": {
"description": "Config data",
"type": "object"
},
"config_data_w_properties": {
"description": "Config data with properties",
"type": "object",
"properties": {}
}
},
"required": [
"timeout"
],
"additionalProperties": false
}
},
"required": [
"name",
"config"
],
"type": "object"
}
which produced this response from OpenAI:
1. **Parameters in the tool "test_structured_dict":**
- `name`: A name string
- `data3?`: Even more arbitrary data (optional)
- `data4?`: Even even more arbitrary data (optional)
- `config`: A configuration object with defined properties
- `timeout`: Timeout in seconds
- `config_data_w_properties?`: Config data with properties (optional)
2. **Parameters in the provided JSON schema:**
- `name`: A name string
- `data1`: Some arbitrary data
- `data2`: Some more arbitrary data
- `data3`: Even more arbitrary data
- `data4`: Even even more arbitrary data
- `config`: A configuration object with defined properties
- `timeout`: Timeout in seconds
- `config_data`: Config data
- `config_data_w_properties`: Config data with properties
from which I conclude that the presence of a properties key (even one with an empty dict for a value) is enough to make OpenAI see the parameter.
So based on all that I've updated the logic to look for schema entries which have:
"type": "object"- no
propertieskey.
|
This PR is stale, and will be closed in 3 days if no reply is received. |
|
Apologies for the slow progress here - I got caught up in end-of-year work and then forgot about it over a holiday break. I'll take another pass today. |
Closes #3654 by adding a warning when an OpenAI-powered agent has a tool with a
dict-typed argument.