#807 Add field name to AST object memoization for better performance of raw record processing #808

yruslan · 2025-11-26T08:18:08Z

Closes #807

Summary by CodeRabbit

New Features
- Added public indicators to detect copybook structure (flat vs hierarchical).
Performance & Refactoring
- Improved repeated field-access performance via internal caching.
- Simplified and centralized field-lookup logic for clearer, more maintainable behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…of raw record processing.

coderabbitai · 2025-11-26T08:18:27Z

Walkthrough

Adds two concurrent caches for primitives and statements to Copybook, exposes structure flags (isFlatCopybook, isHierarchical), centralizes primitive lookup via a new private helper, and updates field-get/set paths to use caching and simplified pattern matches. (47 words)

Changes

Cohort / File(s)	Summary
Copybook concurrency & lookup caching `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala`	Adds `cachePrimitives: ConcurrentHashMap[String, Primitive]` and `cacheStatements: ConcurrentHashMap[String, Statement]`. Introduces `val isFlatCopybook` and `lazy val isHierarchical`. Adds private `getPrimitiveFieldByName(fieldName: String): Primitive`. Updates `getFieldValueByName` and `setFieldValueByName` to use the helper and caches. Refactors several internal lookup helpers to use flatMap closures and caching. Minor reorganization of method placement and class end braces.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Copybook
    participant Cache as cachePrimitives
    participant AST as Copybook AST

    rect rgb(230,200,230)
    Note over Client,Cache: New flow: cache checked first
    Client->>Copybook: getFieldValueByName(name)
    Copybook->>Cache: get(name)
    alt Cache Hit
        Cache-->>Copybook: Primitive
    else Cache Miss
        Copybook->>AST: traverse/search for field
        AST-->>Copybook: Primitive
        Copybook->>Cache: put(name, Primitive)
    end
    Copybook-->>Client: field value (extracted from Primitive)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Attention areas:
- Correctness of cache key selection and concurrent usage (ConcurrentHashMap patterns).
- Ensure getPrimitiveFieldByName preserves previous semantics and error/exception behavior.
- Validate isFlatCopybook / isHierarchical computation and laziness.
- Confirm refactored flatMap/pattern-match changes don't alter edge-case lookups (unique names, nested groups).
- Tests for lifecycle consistency of cached Statement/Primitive objects.

Poem

🐰 I hopped the AST, then paused and stared,
Caches planted where once I fared,
Now names resolve with fewer bounds,
Faster hops on cached playgrounds,
My little heart — a quicker drum, hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the main change: adding field name to AST object memoization to improve performance of raw record processing, which aligns with the changeset's introduction of caching mechanisms.
Linked Issues check	✅ Passed	The pull request successfully implements the objective from issue #807 by introducing cachePrimitives and cacheStatements caches to memoize field lookups, eliminating repeated AST traversal when accessing fields by name.
Out of Scope Changes check	✅ Passed	All changes are directly related to the caching/memoization objective: new caches, helper methods for cached retrieval, and updates to existing methods to use caching. No unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/807-field-access-memoize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-26T08:20:37Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.55%	🍏
Files changed	98.74%	🍏

File	Coverage
Copybook.scala	83.57% `-0.28%`	🍏

github-actions · 2025-11-26T08:20:38Z

JaCoCo code coverage report - 'spark-cobol'

There is no coverage information present for the Files changed

Total Project Coverage	81.37%	🍏

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd292b4 and d66f109.

📒 Files selected for processing (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (1)

getAllSegmentRedefines (332-348)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.5.7 on Scala 2.13.17
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 3.4.4 on Scala 2.12.20

🔇 Additional comments (9)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (9)

24-24: LGTM!

The use of ConcurrentHashMap for caching field lookups is appropriate for thread-safe memoization and aligns with the PR objective to improve performance of field name access.

Also applies to: 32-33

35-35: LGTM!

The isFlatCopybook flag correctly identifies whether the copybook has a flat structure by checking for primitive children at the root level.

41-44: LGTM!

The isHierarchical property correctly identifies hierarchical copybook structures by checking for parent-child relationships in segment redefines. The lazy evaluation is appropriate for this potentially expensive check.

46-47: LGTM!

Simple accessor method for the AST. While not used within this file, it provides a consistent API for external callers.

90-94: LGTM!

The refactor to use getPrimitiveFieldByName enables caching of primitive field lookups, which directly addresses the performance objective in issue #807.

109-113: LGTM!

Consistent with the read path, this refactor enables caching for field writes, improving performance for both read and write operations.

129-132: LGTM!

The pattern matching refactor is a nice Scala idiom improvement, making the code more concise while maintaining the same functionality.

144-153: LGTM!

Consistent pattern matching improvement for better code readability and Scala idioms.

350-364: LGTM!

The caching pattern is correctly implemented here—the method checks the cache, computes if missing, stores the result, and returns it. This provides the performance improvement for primitive field access as intended.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)

180-203: Consider using computeIfAbsent for cleaner cache pattern.

The current "get-check-compute-put" pattern is correct but could be simplified and made slightly more efficient using computeIfAbsent, which handles the null check and atomic insertion in one call.

Apply this diff to use the more idiomatic pattern:

-    val cachedStatement = cacheStatements.get(fieldName)
-
-    if (cachedStatement == null) {
+    cacheStatements.computeIfAbsent(fieldName, _ => {
       val schema = getCobolSchema
 
       val foundFields = if (fieldName.contains('.')) {
         getFieldByPathName(schema, fieldName)
       } else {
         getFieldByUniqueName(schema, fieldName)
       }
 
       if (foundFields.isEmpty) {
         throw new IllegalStateException(s"Field '$fieldName' is not found in the copybook.")
       } else if (foundFields.lengthCompare(1) == 0) {
-        val result = foundFields.head
-        cacheStatements.put(fieldName, result)
-        result
+        foundFields.head
       } else {
         throw new IllegalStateException(s"Multiple fields with name '$fieldName' found in the copybook. Please specify the exact field using '.' " +
           s"notation.")
       }
-    } else {
-      cachedStatement
-    }
+    })

352-366: Consider similar computeIfAbsent pattern for primitive cache.

For consistency and the same benefits as mentioned in the previous comment, getPrimitiveFieldByName could also use computeIfAbsent.

Apply this diff:

-    val cachedPrimitive = cachePrimitives.get(fieldName)
-
-    if (cachedPrimitive == null) {
+    cachePrimitives.computeIfAbsent(fieldName, _ => {
       val ast = getFieldByName(fieldName)
       ast match {
         case s: Primitive =>
-          cachePrimitives.put(fieldName, s)
           s
         case _ => throw new IllegalStateException(s"$fieldName is not a primitive field, cannot extract its value.")
       }
-    } else {
-      cachedPrimitive
-    }
+    })

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d66f109 and 44b6d38.

📒 Files selected for processing (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (1)

getAllSegmentRedefines (332-348)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala (1)

getCobolSchema (61-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.5.7 on Scala 2.13.17
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (6)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (6)

180-203: Cache population has been correctly implemented.

The previous review correctly identified that the cache was never populated. This has now been fixed with the addition of cacheStatements.put(fieldName, result) at line 195. The cache will now provide the intended performance benefits.

90-94: LGTM! Clean refactoring with caching.

The refactoring to use getPrimitiveFieldByName centralizes the primitive lookup logic and ensures caching is applied consistently.

109-113: LGTM! Consistent with the getter refactoring.

The setter now uses the same cached primitive lookup, maintaining symmetry with getFieldValueByName.

129-132: LGTM! More idiomatic pattern matching.

The refactoring to use flatMap is cleaner and more functional in style, making the code easier to read and maintain.

Also applies to: 144-153

32-35: LGTM! Cache declarations and structural flags are well-designed.

The use of ConcurrentHashMap ensures thread-safe caching, and the new isFlatCopybook flag provides a simple structural check. Both additions align well with the PR's performance objectives.

41-44: LGTM! Lazy evaluation is appropriate for hierarchical check.

The isHierarchical flag correctly uses lazy evaluation since it requires traversing the AST to find segment redefines, which is an operation that should only be performed when needed.

#807 Add field name to AST object memoization for better performance …

d66f109

…of raw record processing.

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala Show resolved Hide resolved

#807 Fix PR suggestions (Thanks @codarabbitai).

44b6d38

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

yruslan merged commit 316c952 into master Nov 26, 2025
7 checks passed

yruslan deleted the feature/807-field-access-memoize branch November 26, 2025 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#807 Add field name to AST object memoization for better performance of raw record processing #808

#807 Add field name to AST object memoization for better performance of raw record processing #808

Uh oh!

yruslan commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#807 Add field name to AST object memoization for better performance of raw record processing #808

#807 Add field name to AST object memoization for better performance of raw record processing #808

Uh oh!

Conversation

yruslan commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Nov 26, 2025

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yruslan commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading