Skip to content

Conversation

@paldepind
Copy link
Contributor

@paldepind paldepind commented Mar 7, 2025

Note:

  • I've put this query under CWE-730 as that's where the Swift one was. But given that the Rust regex implementation is robust regarding DOS attacks, I think we should find another directory for it?

@github-actions github-actions bot added documentation Rust Pull requests that update Rust code labels Mar 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2025

QHelp previews:

rust/ql/src/queries/security/CWE-020/RegexInjection.qhelp

Regular expression injection

Constructing a regular expression with unsanitized user input can be dangerous. A malicious user may be able to modify the meaning of the expression, causing it to match unexpected strings and construct large regular expressions by using counted repetitions.

Recommendation

Before embedding user input into a regular expression, escape the input string using a function such as regex::escape to escape meta-characters that have special meaning.

If purposefully supporting user supplied regular expressions, then use RegexBuilder::size_limit to limit the pattern size so that it is no larger than necessary.

Example

The following example constructs a regular expressions from the user input key without escaping it first.

use regex::Regex;

fn get_value<'h>(key: &str, property: &'h str) -> Option<&'h str> {
    // BAD: User provided `key` is interpolated into the regular expression.
    let pattern = format!(r"^property:{key}=(.*)$");
    let re = Regex::new(&pattern).unwrap();
    re.captures(property)?.get(1).map(|m| m.as_str())
}

The regular expression is intended to match strings starting with "property" such as "property:foo=bar". However, a malicious user might inject the regular expression ".*^|key" and unexpectedly cause strings such as "key=secret" to match.

If user input is used to construct a regular expression, it should be escaped first. This ensures that malicious users cannot insert characters that have special meanings in regular expressions.

use regex::{escape, Regex};

fn get_value<'h>(key: &str, property: &'h str) -> option<&'h str> {
    // GOOD: User input is escaped before being used in the regular expression.
    let escaped_key = escape(key);
    let pattern = format!(r"^property:{escaped_key}=(.*)$");
    let re = regex::new(&pattern).unwrap();
    re.captures(property)?.get(1).map(|m| m.as_str())
}

References

@paldepind paldepind force-pushed the rust-regex-injection branch 3 times, most recently from 49b18d5 to 14722dc Compare March 7, 2025 11:18
@paldepind paldepind force-pushed the rust-regex-injection branch from 14722dc to 494f914 Compare March 7, 2025 11:37
@paldepind paldepind marked this pull request as ready for review March 7, 2025 12:59
pack: codeql/rust-all
extensible: summaryModel
data:
- ["repo:https://github.com/rust-lang/regex:regex", "crate::escape", "Argument[0].Reference", "ReturnValue", "taint", "manual"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model is accurate, but I mostly added it to make sure that the barrier actually worked.

@paldepind paldepind requested a review from geoffw0 March 7, 2025 13:03
@geoffw0 geoffw0 added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Mar 7, 2025
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🎉

We will need to check if there are any results in a DCA run, and I'm very interested to get an idea of how many sinks there are (regardless of results) as an indication of how much regex usage we are seeing.

<code>"property"</code> such as <code>"property:foo=bar"</code>. However, a
malicious user might inject the regular expression <code>".*^|key"</code> and
unexpectedly cause strings such as <code>"key=secret"</code> to match.
</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great example! I've added the ready-for-doc-review tag so this PR to attract a member of the docs team for further review of the .qhelp.

*/
abstract class RegexInjectionBarrier extends DataFlow::Node { }

/** A sink for `a` in `Regex::new(a)` when `a` is not a literal. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to defining this sink in QL rather than models-as-data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to not consider cases where Regex::new is directly applied to a literal as sinks. I think most real-world uses of Regex::new is of that form, so it's simply as an optimization to not consider these clearly safe cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Note that we do not currently do this in the cleartext logging query, which has far more sinks the vast majority of which just take a literal. If you think it makes a difference we might consider doing it there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told it could make a difference. But it probably doesn't matter much and let's double check before we spend time rewriting a bunch of stuff.

@mchammer01 mchammer01 self-requested a review March 10, 2025 12:17
mchammer01
mchammer01 previously approved these changes Mar 10, 2025
Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought I had already submitted a review.
Approving this on behalf of Docs ✨
Left a few comments and suggestions for your consideration @paldepind

<overview>
<p>
Constructing a regular expression with unsanitized user input can be dangerous.
A malicious user may be able to modify the meaning of the expression causing it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - separating the clauses with a comma

Suggested change
A malicious user may be able to modify the meaning of the expression causing it
A malicious user may be able to modify the meaning of the expression, causing it

<p>
Constructing a regular expression with unsanitized user input can be dangerous.
A malicious user may be able to modify the meaning of the expression causing it
to match unexpected strings and to construct large regular expressions by using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can possibly simplify here

Suggested change
to match unexpected strings and to construct large regular expressions by using
to match unexpected strings and construct large regular expressions by using


<example>
<p>
The following example construct a regular expressions from the user input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
The following example construct a regular expressions from the user input
The following example constructs a regular expressions from the user input

<p>
If purposefully supporting user supplied regular expressions, then use <a
href="https://docs.rs/regex/latest/regex/struct.RegexBuilder.html#method.size_limit">RegexBuilder::size_limit</a>
to limit the pattern size such that it is no larger than necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo / simplifying

Suggested change
to limit the pattern size such that it is no larger than necessary.
to limit the pattern size so that it is no larger than necessary.


<references>
<li>
<code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Items in the References section need a full stop

Suggested change
<code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a>
<code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a>.

unexpectedly cause strings such as <code>"key=secret"</code> to match.
</p>
<p>
If user input is used to construct a regular expression it should be escaped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comma to separate the clauses

Suggested change
If user input is used to construct a regular expression it should be escaped
If user input is used to construct a regular expression, it should be escaped

</p>
<p>
If user input is used to construct a regular expression it should be escaped
first. This ensures that the user cannot insert characters that have special
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?

Suggested change
first. This ensures that the user cannot insert characters that have special
first. This ensures that malicious users cannot insert characters that have special

@@ -0,0 +1,45 @@
/**
* @name Regular expression injection
* @description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description, which is mandatory, is missing. For guidance about writing query descriptions, see https://github.com/github/codeql/blob/main/docs/query-metadata-style-guide.md#query-descriptions-description.

@mchammer01
Copy link
Contributor

I think more commits have been added since I started reviewing this, as all my comments are outdated 😞

@paldepind
Copy link
Contributor Author

I think more commits have been added since I started reviewing this, as all my comments are outdated 😞

I moved the files to a different directory. So the comments are outdated but the actual suggestions still apply 👍

@paldepind
Copy link
Contributor Author

paldepind commented Mar 10, 2025

Thanks for the review @mchammer01. I've applied your suggestions and added the missing description property.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

DCA run also LGTM.

Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
@paldepind paldepind merged commit b3601b1 into github:main Mar 12, 2025
19 checks passed
@paldepind paldepind deleted the rust-regex-injection branch March 12, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. Rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants