Skip to content

Conversation

@hhhhsc701
Copy link
Contributor

No description provided.

# Conflicts:
#	backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/domain/model/entity/CleaningTask.java
#	scripts/db/data-cleaning-init.sql
public List<CleaningTaskLog> getTaskLog(String taskId, int retryCount) {
cleanTaskValidator.checkTaskId(taskId);
String logPath = FLOW_PATH + "/" + taskId + "/output.log";
if (retryCount > 0) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 9 hours ago

In general, user-controlled data used in path expressions should be constrained so that the resulting path cannot escape an intended base directory or become an absolute path. A common and robust pattern is: (1) define a base directory as a Path, (2) resolve the user input as a child path using base.resolve(userInput), (3) normalize the result, and (4) verify that the normalized result still starts with the base directory. If any check fails, reject the request.

For this specific case, we should treat FLOW_PATH as the root of all cleaning task logs, and ensure that taskId cannot break out of that directory. Instead of using String logPath = FLOW_PATH + "/" + taskId + "/output.log"; and Paths.get(logPath), we should build a Path like:

Path baseFlowPath = Paths.get(FLOW_PATH).toAbsolutePath().normalize();
Path logFilePath = baseFlowPath.resolve(taskId).resolve("output.log").toAbsolutePath().normalize();
if (!logFilePath.startsWith(baseFlowPath)) {
    throw new BusinessException(SystemErrorCode.ILLEGAL_ARGUMENT, "Invalid taskId");
}

Then use Files.lines(logFilePath) and in the error logging use logFilePath.toString(). This preserves existing functionality (log files still live under FLOW_PATH/<taskId>/output.log[.retry]) while ensuring that even if taskId contains .. or separators, the normalized path is checked against the base directory before reading. We can keep the existing cleanTaskValidator.checkTaskId(taskId); call; the new checks are an additional defense-in-depth and the direct mitigation for this CodeQL alert.

We only need to adjust the getTaskLog method in CleaningTaskService and slightly change the imports to use java.nio.file.Path instead of java.nio.file.Paths directly where appropriate. No changes are required in CleaningTaskController.

Suggested changeset 1
backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/application/CleaningTaskService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/application/CleaningTaskService.java b/backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/application/CleaningTaskService.java
--- a/backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/application/CleaningTaskService.java
+++ b/backend/services/data-cleaning-service/src/main/java/com/datamate/cleaning/application/CleaningTaskService.java
@@ -43,6 +43,7 @@
 import java.io.FileWriter;
 import java.io.IOException;
 import java.nio.file.Files;
+import java.nio.file.Path;
 import java.nio.file.Paths;
 import java.util.*;
 import java.util.concurrent.atomic.AtomicReference;
@@ -159,11 +160,19 @@
 
     public List<CleaningTaskLog> getTaskLog(String taskId, int retryCount) {
         cleanTaskValidator.checkTaskId(taskId);
-        String logPath = FLOW_PATH + "/" + taskId + "/output.log";
+
+        Path baseFlowPath = Paths.get(FLOW_PATH).toAbsolutePath().normalize();
+        Path logFilePath = baseFlowPath.resolve(taskId).resolve("output.log").toAbsolutePath().normalize();
         if (retryCount > 0) {
-            logPath += "." + retryCount;
+            logFilePath = logFilePath.resolveSibling("output.log." + retryCount);
+            logFilePath = logFilePath.toAbsolutePath().normalize();
         }
-        try (Stream<String> lines = Files.lines(Paths.get(logPath))) {
+
+        if (!logFilePath.startsWith(baseFlowPath)) {
+            throw new BusinessException(SystemErrorCode.ILLEGAL_ARGUMENT, "Invalid taskId");
+        }
+
+        try (Stream<String> lines = Files.lines(logFilePath)) {
             List<CleaningTaskLog> logs = new ArrayList<>();
             AtomicReference<String> lastLevel = new AtomicReference<>("INFO");
             lines.forEach(line -> {
@@ -175,7 +182,7 @@
             });
             return logs;
         } catch (IOException e) {
-            log.error("Fail to read log file {}", logPath, e);
+            log.error("Fail to read log file {}", logFilePath.toString(), e);
             return Collections.emptyList();
         }
     }
EOF
@@ -43,6 +43,7 @@
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
import java.util.concurrent.atomic.AtomicReference;
@@ -159,11 +160,19 @@

public List<CleaningTaskLog> getTaskLog(String taskId, int retryCount) {
cleanTaskValidator.checkTaskId(taskId);
String logPath = FLOW_PATH + "/" + taskId + "/output.log";

Path baseFlowPath = Paths.get(FLOW_PATH).toAbsolutePath().normalize();
Path logFilePath = baseFlowPath.resolve(taskId).resolve("output.log").toAbsolutePath().normalize();
if (retryCount > 0) {
logPath += "." + retryCount;
logFilePath = logFilePath.resolveSibling("output.log." + retryCount);
logFilePath = logFilePath.toAbsolutePath().normalize();
}
try (Stream<String> lines = Files.lines(Paths.get(logPath))) {

if (!logFilePath.startsWith(baseFlowPath)) {
throw new BusinessException(SystemErrorCode.ILLEGAL_ARGUMENT, "Invalid taskId");
}

try (Stream<String> lines = Files.lines(logFilePath)) {
List<CleaningTaskLog> logs = new ArrayList<>();
AtomicReference<String> lastLevel = new AtomicReference<>("INFO");
lines.forEach(line -> {
@@ -175,7 +182,7 @@
});
return logs;
} catch (IOException e) {
log.error("Fail to read log file {}", logPath, e);
log.error("Fail to read log file {}", logFilePath.toString(), e);
return Collections.emptyList();
}
}
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants