-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rust: Add regular expression injection query #18946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QHelp previews: rust/ql/src/queries/security/CWE-020/RegexInjection.qhelpRegular expression injectionConstructing a regular expression with unsanitized user input can be dangerous. A malicious user may be able to modify the meaning of the expression, causing it to match unexpected strings and construct large regular expressions by using counted repetitions. RecommendationBefore embedding user input into a regular expression, escape the input string using a function such as regex::escape to escape meta-characters that have special meaning. If purposefully supporting user supplied regular expressions, then use RegexBuilder::size_limit to limit the pattern size so that it is no larger than necessary. ExampleThe following example constructs a regular expressions from the user input use regex::Regex;
fn get_value<'h>(key: &str, property: &'h str) -> Option<&'h str> {
// BAD: User provided `key` is interpolated into the regular expression.
let pattern = format!(r"^property:{key}=(.*)$");
let re = Regex::new(&pattern).unwrap();
re.captures(property)?.get(1).map(|m| m.as_str())
}The regular expression is intended to match strings starting with If user input is used to construct a regular expression, it should be escaped first. This ensures that malicious users cannot insert characters that have special meanings in regular expressions. use regex::{escape, Regex};
fn get_value<'h>(key: &str, property: &'h str) -> option<&'h str> {
// GOOD: User input is escaped before being used in the regular expression.
let escaped_key = escape(key);
let pattern = format!(r"^property:{escaped_key}=(.*)$");
let re = regex::new(&pattern).unwrap();
re.captures(property)?.get(1).map(|m| m.as_str())
}References
|
49b18d5 to
14722dc
Compare
14722dc to
494f914
Compare
| pack: codeql/rust-all | ||
| extensible: summaryModel | ||
| data: | ||
| - ["repo:https://github.com/rust-lang/regex:regex", "crate::escape", "Argument[0].Reference", "ReturnValue", "taint", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model is accurate, but I mostly added it to make sure that the barrier actually worked.
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🎉
We will need to check if there are any results in a DCA run, and I'm very interested to get an idea of how many sinks there are (regardless of results) as an indication of how much regex usage we are seeing.
| <code>"property"</code> such as <code>"property:foo=bar"</code>. However, a | ||
| malicious user might inject the regular expression <code>".*^|key"</code> and | ||
| unexpectedly cause strings such as <code>"key=secret"</code> to match. | ||
| </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great example! I've added the ready-for-doc-review tag so this PR to attract a member of the docs team for further review of the .qhelp.
| */ | ||
| abstract class RegexInjectionBarrier extends DataFlow::Node { } | ||
|
|
||
| /** A sink for `a` in `Regex::new(a)` when `a` is not a literal. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit to defining this sink in QL rather than models-as-data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to not consider cases where Regex::new is directly applied to a literal as sinks. I think most real-world uses of Regex::new is of that form, so it's simply as an optimization to not consider these clearly safe cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Note that we do not currently do this in the cleartext logging query, which has far more sinks the vast majority of which just take a literal. If you think it makes a difference we might consider doing it there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told it could make a difference. But it probably doesn't matter much and let's double check before we spend time rewriting a bunch of stuff.
mchammer01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought I had already submitted a review.
Approving this on behalf of Docs ✨
Left a few comments and suggestions for your consideration @paldepind
| <overview> | ||
| <p> | ||
| Constructing a regular expression with unsanitized user input can be dangerous. | ||
| A malicious user may be able to modify the meaning of the expression causing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - separating the clauses with a comma
| A malicious user may be able to modify the meaning of the expression causing it | |
| A malicious user may be able to modify the meaning of the expression, causing it |
| <p> | ||
| Constructing a regular expression with unsanitized user input can be dangerous. | ||
| A malicious user may be able to modify the meaning of the expression causing it | ||
| to match unexpected strings and to construct large regular expressions by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can possibly simplify here
| to match unexpected strings and to construct large regular expressions by using | |
| to match unexpected strings and construct large regular expressions by using |
|
|
||
| <example> | ||
| <p> | ||
| The following example construct a regular expressions from the user input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
| The following example construct a regular expressions from the user input | |
| The following example constructs a regular expressions from the user input |
| <p> | ||
| If purposefully supporting user supplied regular expressions, then use <a | ||
| href="https://docs.rs/regex/latest/regex/struct.RegexBuilder.html#method.size_limit">RegexBuilder::size_limit</a> | ||
| to limit the pattern size such that it is no larger than necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo / simplifying
| to limit the pattern size such that it is no larger than necessary. | |
| to limit the pattern size so that it is no larger than necessary. |
|
|
||
| <references> | ||
| <li> | ||
| <code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Items in the References section need a full stop
| <code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a> | |
| <code>regex</code> crate documentation: <a href="https://docs.rs/regex/latest/regex/index.html#untrusted-patterns">Untrusted patterns</a>. |
| unexpectedly cause strings such as <code>"key=secret"</code> to match. | ||
| </p> | ||
| <p> | ||
| If user input is used to construct a regular expression it should be escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comma to separate the clauses
| If user input is used to construct a regular expression it should be escaped | |
| If user input is used to construct a regular expression, it should be escaped |
| </p> | ||
| <p> | ||
| If user input is used to construct a regular expression it should be escaped | ||
| first. This ensures that the user cannot insert characters that have special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps?
| first. This ensures that the user cannot insert characters that have special | |
| first. This ensures that malicious users cannot insert characters that have special |
| @@ -0,0 +1,45 @@ | |||
| /** | |||
| * @name Regular expression injection | |||
| * @description | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description, which is mandatory, is missing. For guidance about writing query descriptions, see https://github.com/github/codeql/blob/main/docs/query-metadata-style-guide.md#query-descriptions-description.
|
I think more commits have been added since I started reviewing this, as all my comments are outdated 😞 |
I moved the files to a different directory. So the comments are outdated but the actual suggestions still apply 👍 |
|
Thanks for the review @mchammer01. I've applied your suggestions and added the missing description property. |
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
DCA run also LGTM.
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
Note:
CWE-730as that's where the Swift one was. But given that the Rust regex implementation is robust regarding DOS attacks, I think we should find another directory for it?