Skip to content

Conversation

@labkey-nicka
Copy link
Contributor

@labkey-nicka labkey-nicka commented Dec 2, 2025

Rationale

For the last few years sample types have had two pathways through the code for updating samples; row-by-row and data iterator. These two pathways are expected to overlap completely in terms of functional input/output, however, we often see issues where something is supported (or broken) in one and not the other. Having these two pathways is difficult to maintain, is easy to introduce bugs if both pathways are not considered, and overall gives us less confidence in the sample update process.

With these changes the row-by-row update pathway has been removed entirely. The data iterator pathway is now the only way samples are updated. This should help us with stability and reduce complexity. One challenge we have faced until now is how to reconcile these pathways because data iterator requires that all rows have the same columns specified where as row-by-row doesn't. To achieve functional parity in data iterator we now partition the rows by the columns specified before iterating. Each partition is then looped over, in order, until all rows are processed.

Additionally, this removes the duplicate LSID column from provisioned sample type tables. Additionally, it updates the data iterator pathway for samples to support being keyed by (RowId) or (Name, MaterialSourceId). Formerly, specifying (LSID) was the only way to invoke the data iterator pathway. With these changes specifying LSID as a key for sample update is no longer supported.

Sample Update Functional Changes:

  • LSID is no longer supported as a key for updating samples.
  • Specifying only LSID, without RowId or Name, will result in an error.
  • RowId is not accepted when merging samples. The sample name can be specified instead.
    • If a user encounters this and is unable to workaround it we have a new experimental flag that will let them opt-out of this behavior and accept the RowId. The RowId column values will be ignored.

Related Pull Requests

Changes

  • Implement new data iterators to perform same checks as row-by-row pathway.
  • Consolidate logic for resolving parameters for ExistingRecordDataIterator via new experiment table interfaces getExistingRecordKeyColumnNames() and getExistingRecordSharedKeyColumnNames().
  • Support updating experiment object properties (a.k.a. vocabulary properties) on samples / data via data iterator
  • Upgrade script to drop LSID column from provisioned sample type tables
  • Updates to join provisioned table by rowId instead of LSID.
  • Loop over rows with same column shape and process in batches via data iterator.
  • Support renaming samples when rowId is specified via data iterator and via file import/update.
  • Remove row-by-row pathway.

if (o instanceof String s)
{
aliquotParentName = StringUtilsLabKey.unquoteString((String) o);
aliquotParentName = StringUtilsLabKey.unquoteString(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is existing code. Just pointing out that unquoteString() looks very out of place here???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to use ExperimentService.getParentValues here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan on leaving as-is for this PR.


List<UploadSampleRunRecord> runRecords = new ArrayList<>();
Set<String> keys = new LinkedHashSet<>();
Set<Object> keys = new LinkedHashSet<>();
Copy link
Contributor

@labkey-matthewb labkey-matthewb Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type validation, should we use LongHashMap/CaseInsensitiveHashMap as appropriate and then cast to Set of Object? It's nice to move toward (runtime) type validating collections where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, validation is good. However, I'm going to leave as-is for this PR.

var map = DataIteratorUtil.createColumnNameMap(in);
if (map.containsKey(RootMaterialRowId.toString()) || !map.containsKey(RowId.toString()))
return in;
var ret = new SimpleTranslator(in, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong with this, but note to self: we should have a simple pattern for this that starts with WrapperDataIterator.

@labkey-nicka labkey-nicka force-pushed the fb_remove_sample_lsid branch from 84442f1 to 957aeb4 Compare December 3, 2025 23:38
@labkey-nicka labkey-nicka force-pushed the fb_remove_sample_lsid branch 2 times, most recently from f4d9bc6 to 7398e73 Compare December 5, 2025 22:14
@labkey-nicka labkey-nicka requested a review from XingY December 11, 2025 18:03
);
}, DbScope.CommitTaskOption.POSTCOMMIT);
var nextIndex = index + 1;
while (nextIndex < rows.size() && rowKeys.equals(new CaseInsensitiveHashSet(rows.get(nextIndex).keySet())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-note: If the rows happen to be ArrayListMap then the keySet() objects should be the same (==) and this will be fast. I don't know in which code paths that would be the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, only the test version of ArrayListMap overrides keySet(), the server implementation should too!

@labkey-nicka labkey-nicka merged commit f610a31 into develop Dec 13, 2025
7 of 14 checks passed
@labkey-nicka labkey-nicka deleted the fb_remove_sample_lsid branch December 13, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants