-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix: Resolve grafting bugs #6228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix: Resolve grafting bugs #6228
Conversation
0738b0e to
103da40
Compare
Signed-off-by: Maksim Dimitrov <dimitrov.maksim@gmail.com>
Signed-off-by: Maksim Dimitrov <dimitrov.maksim@gmail.com>
103da40 to
8926e4b
Compare
| column: &str, | ||
| ) -> Result<Vec<i64>, StoreError> { | ||
| const QUERY: &str = "select histogram_bounds::text::int8[] bounds \ | ||
| const QUERY: &str = "select coalesce(histogram_bounds::text::int8[], '{}'::int8[]) as bounds \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops .. nice!
| self.primary_conn() | ||
| .await? | ||
| .record_active_copy(graft_base.site.as_ref(), site.as_ref()) | ||
| .record_active_copy(graft_base_layout.site.as_ref(), site.as_ref()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also happen in the transaction above? Otherwise, can't we end up in a situation where we set up everything except recording the active copy if graph-node gets killed atthe wrong moment?
| let base_layout = self.layout(graft_base).await?; | ||
| let entities_with_causality_region = | ||
| deployment.manifest.entities_with_causality_region.clone(); | ||
| let catalog = Catalog::for_tests( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong - for_tests doesn't actually check the database for its capabilities (and the method should be marked as #[cfg(debug_assertions)]). Instead, this needs to use Catalog::for_creation where the connection is a connection to the shard in which we will create the subgraph. It's best to create the catalog object outside of this transaction so that we don't hold multiple db connections at once.
Resolves #6220, resolves #6221
Issue: #6220
Currently, the
can_copy_fromvalidation is only performed when a deployment is considered new—that is, when no existingdeployment_schemarecord exists for the given Qm hash.In the code here, the layout is assigned if the deployment is new. If a
deployment_schemarecord already exists, the code assumes the deployment was previously created and copied successfully, and therefore setsgraft_base_layouttoNone.However, if
can_copy_fromlater fails (see here), the transaction rolls back all changes except the record in thedeployment_schemastable. This leaves the deployment in a partially created state and copy was never attemptedWhen the same deployment is attempted again, it is no longer treated as new. As a result,
graft_base_layoutis set toNone, and thecan_copy_fromcheck is skipped. The deployment then proceeds as if it were valid, the copy process starts, but it fails again—ultimately leaving the deployment stuck in a "never synced" state.What this PR changes:
Move the
can_copy_fromcheck and the site allocation in a transaction so a failing check fully rolls back the allocation.Remove the duplicate/late compatibility check from deployment_store.rs
Use Catalog + Layout::new(...) to construct the layout and run layout.can_copy_from(&base_layout) before committing.
Minor cleanups (string formatting improvements, unused variables removed).
Removed the
can_copy_fromcheck fromcopy_deploymentbecause it passed the source layout as the graft base and then compared it against the destination layout, which seems as a redundant check since the two layouts are the same.Result
Now every re-deploy of a failed deployment should behave as a brand new deployment, and the
if !existscheck should behave correctly.Note: This approach may result in
deployment_schemas_id_seqgaps