Implement semantic matching for joins based on attribute lineage #1301
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements semantic matching for joins in DataJoint 2.0. Instead of matching attributes purely by name (natural join), DataJoint now tracks attribute lineage (origin) and only allows joins on attributes that share both the same name AND the same lineage.
📄 Full Specification - API reference, user guide, and implementation details
Key Changes
lineageproperty indicating its origin (schema.table.attribute)~lineagetable: Hidden per-schema table storing lineage information, populated at table declaration timeschema.rebuild_lineage()to restore lineage for legacy schemas,schema.lineage_table_existsproperty@(permissive join) and^(permissive restriction) replaced by.join(semantic_check=False)and.restrict(semantic_check=False)dj.U * tableremoved: Join with universal set is no longer supportedNew Files
src/datajoint/lineage.py- Lineage management moduletests/integration/test_semantic_matching.py- 21 comprehensive testsdocs/src/design/semantic-matching-spec.md- Full specification with API reference and user guideModified Files
condition.py-assert_join_compatibility()with semantic checkingexpression.py- Updated join/restrict methods, removed@/^operatorsheading.py-lineage_availableproperty, lineage loading from~lineagetabletable.py-_populate_lineage()at declaration, cleanup at dropdeclare.py- FK attribute mapping for lineage trackingschemas.py-rebuild_lineage()method,lineage_table_existspropertyBehavior Summary
~lineagetable missingMigration for Users
Test Plan
🤖 Generated with Claude Code