Skip to content

Commit 3745b99

Browse files
authored
Merge pull request #61 from TogetherCrew/feat/60-limit-mediawiki-load-batch-size
feat: limit batch size to 1!
2 parents f6f3d49 + aaf252c commit 3745b99

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

hivemind_etl/mediawiki/etl.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,9 @@ def load(self, documents: list[Document]) -> None:
103103
)
104104

105105
# Process batches in parallel using ThreadPoolExecutor
106-
batch_size = 1000
106+
# TODO: Revert to larger batch size once llama-index loading issue is resolved
107+
# See: https://github.com/TogetherCrew/temporal-worker-python/issues/60
108+
batch_size = 1
107109
batches = [documents[i:i + batch_size] for i in range(0, len(documents), batch_size)]
108110

109111
with ThreadPoolExecutor(max_workers=10) as executor:

0 commit comments

Comments
 (0)