Skip to content

Conversation

@harshach
Copy link
Collaborator

@harshach harshach commented Jan 21, 2026

Describe your changes:

Fixes

Issue:

[2026-01-20 20:19:01] WARNING {metadata.Ingestion:status:109} - Error yielding tag []: [1 validation error for EntityName

String should have at least 1 character [type=string_too_short, input_value='', input_type=str]

Fix:
Snowflake tags should exist in key, value pair. It fetches value from set allowed values of a tag. Since one tag can have multiple values from allowed values and link to entity. ref. https://docs.snowflake.com/en/user-guide/object-tagging/introduction

2026-01-22 23:14:58] WARNING  {metadata.Ingestion:metadata:542} - Skipping tag 'TEST_TAG' for 'TEST_DB.PUBLIC.F_SR' - TAG_VALUE is empty. Snowflake tags require a value to be ingested.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Fixed Snowflake tag ingestion:
    • Empty TAG_VALUE now skips gracefully with warnings in set_schema_tags_map() and yield_tag() methods
  • Sink-level validation:
    • Added empty tag name check in MetadataRestSink.write_classification_and_tag() for defense in depth
  • Test coverage:
    • test_empty_tag_value_skipped_with_warning validates Snowflake source-level filtering
    • test_sink_empty_tag_validation.py with 5 test cases for sink-level validation edge cases

This will update automatically on new commits.


@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (17)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@gitar-bot
Copy link

gitar-bot bot commented Jan 22, 2026

🔍 CI failure analysis for 6a9b42e: CI failures across backend Python tests and frontend Playwright tests are unrelated to this PR's backend-only Snowflake tag validation changes.

Issue

Multiple CI failures occurred across different test suites:

  1. Python Integration Test Failures: 2 types of test failures in backend Python tests
  2. Frontend Playwright Test Failures: 17 total failed tests across 3 playwright-ci-postgresql jobs after retries

Root Cause

Backend Test Failures (UNRELATED to PR changes)

Test Failure #1: NHS Number Detection

  • Test test_it_returns_the_expected_classifications failed in ingestion/tests/integration/auto_classification/test_tag_processor.py:150
  • Multiple Python 3.10 and 3.11 jobs affected
  • Evidence this is unrelated: PR only adds empty tag validation for Snowflake; NHS number recognizer failure is for PostgreSQL PII detection auto-classification, completely separate functionality
  • The test file WAS modified in this PR (lines 183-189) but only to fix a type conversion and update expected tags for timestamp_column, NOT for the failing nhs_number_column assertion

Test Errors #2: Trino Classifier Tests

  • 7 tests in ingestion/tests/integration/trino/test_classifier.py errored during fixture setup
  • AttributeError: 'NoneType' object has no attribute 'name' in Sampler step
  • Evidence this is unrelated: PR only touches Snowflake tag handling, not Trino connector or auto-classification workflow infrastructure

Frontend Playwright Test Failures (UNRELATED to PR changes)

The latest Playwright job (61214229748) shows 4 persistent failures in the playwright-ci-postgresql (3/6) shard:

  1. MutuallyExclusiveColumnTags test: page.click: Test timeout of 60000ms exceeded
  2. Table/search consistency tests: Target page, context or browser has been closed
  3. SchemaTable test: expect(locator).toContainText failed
  4. Permissions tests: Element visibility/enabled failures
  5. SettingsNavigation test: Drag and drop UI timing issues
  6. ExploreDiscovery test: Element attachment failures
  7. Metric test: page.waitForResponse: Target page, context or browser has been closed
  8. ServiceForm test: File system errors (ENOENT)

Common patterns:

  • Browser crashes/closures ("Target page, context or browser has been closed")
  • Timeout errors on UI interactions
  • Element visibility/attachment failures
  • Test infrastructure issues

Evidence frontend failures are unrelated:

  1. PR Scope: This PR modifies Python backend ingestion code only:

    • ingestion/src/metadata/ingestion/sink/metadata_rest.py
    • ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py
    • Backend Python test files
  2. No Frontend Changes: PR contains ZERO changes to:

    • Frontend TypeScript/React code
    • UI components for glossary, tags, permissions, tables, lineage, services, etc.
    • Playwright test files themselves
    • Any frontend routing, state management, or API response handling
  3. Failure Patterns: The Playwright failures show characteristic flaky E2E test behavior across diverse features unrelated to backend tag validation logic

Details

Backend Test Failures: Pre-existing or environment-related issues in PostgreSQL PII detection and Trino classifier tests, both unrelated to Snowflake tag validation changes.

Frontend Playwright Test Failures: 17 failed tests across 3 jobs showing UI flakiness, browser stability issues, timing problems, and test cleanup failures. The playwright-ci-postgresql (3/6) shard consistently shows the same 4 persistent failures across multiple runs.

Conclusion: All CI failures are unrelated to this PR's backend-only Snowflake tag validation changes. The failures represent flaky E2E tests and pre-existing test infrastructure issues.

Code Review ✅ Approved 2 resolved / 2 findings

Clean bug fix that properly handles Snowflake tags with empty values, with appropriate defense-in-depth validation at the sink layer and comprehensive test coverage.

More details ✅ 2 resolved
Quality: Duplicated fqn_elements computation in yield_tag

📄 ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py:534
The fqn_elements = [name for name in row[2:] if name] computation is duplicated:

  • Line 534 (inside the early-return block for logging)
  • Line 540 (after the continue, for actual processing)

This duplication is minor but could lead to inconsistencies if the logic needs to change. Consider computing fqn_elements once before the check:

for res in result:
    row = list(res)
    fqn_elements = [name for name in row[2:] if name]
    # row[0] = TAG_NAME, row[1] = TAG_VALUE
    if not row[1]:
        logger.warning(
            f"Skipping tag '{row[0]}' for '{'.'.join(fqn_elements)}' - "
            "TAG_VALUE is empty. Snowflake tags require a value to be ingested."
        )
        continue
    yield from get_ometa_tag_and_classification(
        ...
    )

This eliminates the duplication while preserving the same behavior.

Edge Case: Validation checks tag_name, but the source issue is empty TAG_VALUE

📄 ingestion/src/metadata/ingestion/sink/metadata_rest.py:437
The sink validation checks for empty tag names (if not tag_name or not tag_name.strip()), but the Snowflake source already filters out empty TAG_VALUE before creating tag records. The tag name in Snowflake is TAG_NAME which is always populated - it's the TAG_VALUE that can be empty.

This sink-level validation provides a safety net but may never trigger in practice for the Snowflake case since the source already skips records with empty TAG_VALUE. The warning message says "Skipping tag with empty name" which could be misleading since the actual issue in Snowflake is empty tag values, not empty tag names.

Consider whether this validation is needed given the source-level filtering, or if the message should be more generic to cover different scenarios.

What Works Well

Consistent validation pattern applied at both source (Snowflake) and sink layers. Clear, informative warning messages that help users understand why tags are skipped. Comprehensive test coverage including edge cases like empty strings, None values, and whitespace-only names.

Rules ✅ All requirements met

Gitar Rules

Summary Enhancement: Comprehensive summary exists, accurately describes all changes

2 rules not applicable. Show all rules by commenting gitar display:verbose.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@harshach harshach merged commit 5e15a78 into main Jan 22, 2026
25 of 35 checks passed
@harshach harshach deleted the snowflake_tags branch January 22, 2026 23:48
@github-project-automation github-project-automation bot moved this to Done ✅ in Jan - 2026 Jan 22, 2026
@mohittilala mohittilala added the To release Will cherry-pick this PR into the release branch label Jan 22, 2026
mohittilala added a commit that referenced this pull request Jan 23, 2026
* Ignore Tag creation if the Snowflake Tag doesn't have value

* fix metadata_rest

* add unit test

* py_format

* Address gitar comments

* fix tests

---------

Co-authored-by: ulixius9 <mayursingal9@gmail.com>
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>
mohittilala added a commit that referenced this pull request Jan 23, 2026
* Ignore Tag creation if the Snowflake Tag doesn't have value

* fix metadata_rest

* add unit test

* py_format

* Address gitar comments

* fix tests

---------

Co-authored-by: ulixius9 <mayursingal9@gmail.com>
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>
@mohittilala
Copy link
Contributor

Manually cherry-picked to 1.11.6 and 1.11.7 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Ingestion safe to test Add this label to run secure Github workflows on PRs Snowflake To release Will cherry-pick this PR into the release branch

Projects

Status: Done ✅

Development

Successfully merging this pull request may close these issues.

5 participants