Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive - Incomplete string escaping #18379

Open
kuhe opened this issue Dec 29, 2024 · 8 comments
Open

False positive - Incomplete string escaping #18379

kuhe opened this issue Dec 29, 2024 · 8 comments

Comments

@kuhe
Copy link

kuhe commented Dec 29, 2024

Description of the false positive

Incomplete string escaping or encoding
This does not escape backslash characters in the input.
part = `"${part.replace(/"/g, '\\"')}"`;

This is intentional,

actual string: abc\" -> "abc\\""

JS:

'abc\\"'  '"abc\\\\""'

This code is used in a client library to serialize caller input as-is from list to header value.

const input = ["a", "b", `\\"`];
const serialized = `a,b,"\\\\""`;

Code samples or links to source code

https://github.com/smithy-lang/smithy-typescript/blob/d8446cfa3bf6cbcf1187d1ad744ac5a296e442d7/packages/smithy-client/src/quote-header.ts#L8

@smowton
Copy link
Contributor

smowton commented Dec 30, 2024

So just to be clear, it's acceptable that user input " will be escaped to \", which would not terminate any "-delimited string it was pasted into, but user input \" will escape to \\", leaving an unescaped quote which will terminate such a "-delimited string?

@kuhe
Copy link
Author

kuhe commented Dec 31, 2024

That is not what is happening. In this format, for \\", or in JS \\\\", only the backslash before the double quote is an escape instruction for the receiving side. The first backslash is a normal character that is part of the data.

Here is a code example showing that this transform is reversible without loss of information.

import assert from "node:assert";

function serialize(part) {
  if (part.includes(",") || part.includes('"')) {
    part = `"${part.replace(/"/g, '\\"')}"`;
  }
  return part;
}

function deserialize(part) {
  if (part[0] === '"' && part[part.length - 1] === '"') {
    return part.slice(1, -1).replace(/\\"/g, '"');
  }
  return part;
}

const tests = [
  `abcd`,
  `ab","cd`,
  `ab\\",\\"cd`,
  `\t\"\n\\"\r\\\"\n\\\\"`,
  ``,
  `"`,
  `\"`,
  `\\"`,
  `\\\"`,
  `""""\"\\"\\\"\\\\"""""`,
  ` " " " "\"\\"\\\"\\\\" " " " "`,
  `"\"\\"\\\"\\\\"`,
  `",\","\\"\\\,"\\,",\,\,",\,",\,\\\,""""`,
];

for (const test of tests) {
  assert(test === deserialize(serialize(test)));
  process.stdout.write(".");
}
console.log("ok");

@smowton
Copy link
Contributor

smowton commented Dec 31, 2024

I see. I don't think I understand how this could work -- backslash-escaping the quote to begin with seems as if it is designed for a parser seeking an unescaped end-quote to terminate a string, and if \ doesn't itself get escaped then passing string "\" to deserialization would seem like a problem, since a parser wouldn't know if it was looking at the string \ or a long string beginning " and continuing with whatever input comes next. However I'm sure there's some relevant context I don't know about.

I think this scenario, where " can be backslash-escaped but \\ is not itself an escape sequence for a backslash, is so unusual that the query can't reasonably anticipate it. In that case I would recommend dismissing the code scanning alert per https://docs.github.com/en/enterprise-cloud@latest/code-security/code-scanning/managing-code-scanning-alerts/resolving-code-scanning-alerts -- is that option available to you?

@kuhe
Copy link
Author

kuhe commented Dec 31, 2024

"\" becomes "\"\\"" in this serialization, which is

" - start quote wrapper
\" - escaped quote
\ - lone backslash
\" - another escaped quote
" - end quote wrapper

\ (single backslash) is serialized with no change.

Putting that aside as the specifics of this format may not be the main issue, I understand you can't compromise the detection for this case. I also don't want to do any sort of trick in my code to avoid detection by codeql as a false-positive.

My team does not directly use codeql and this came up as my software's downstream users are getting the alert. I will refer them to this recommendation.

@automartin5000
Copy link

Just wanted to pipe in here as a user of CodeQL and reiterate what I said on the AWS SDK issue.

This is either a bug in the AWS SDK or a bug in CodeQL. But putting it on every one of the AWS JS SDK/CodeQL users to dismiss the alert feels pretty user hostile.

@smowton
Copy link
Contributor

smowton commented Jan 14, 2025

Could you give an example of an alert being raised against an imported package's code? That's already generally supposed to be excluded from CodeQL alerting.

Regarding the bug in Smithy, I still think there probably is one:

If this code is used to encode the list (notated here as bare unescaped strings) [hastrailingslash"\, hasquote"] that's going to encode as "hastrailingslash\"\", "hasquote\"", and I would generally expect a parser to decode that as the string hastrailingslash"", with invalid trailing junk hasquote\"".

I also note that https://stackoverflow.com/questions/68154687/how-to-quote-strings-for-use-in-http-header-fields suggests you probably should escape \ characters inside HTTP header quoted strings.

@smowton
Copy link
Contributor

smowton commented Jan 14, 2025

Ah, have successfully reproduced against files like https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-location-alpha/test/integ.tracker.js.snapshot/asset.b98abee59e034ed29eeb601684dc34752baa86509a7d457d72305d4e19ecc80b.bundle/index.js. While I still recommend double-checking if there is a real bug with the string-escaping, I'll ask if the JS team can recommend a better way to exclude third-party code from analysis here.

@smowton
Copy link
Contributor

smowton commented Jan 14, 2025

In general there is desire to better-recognise third-party code in a user's repository; however this isn't a quick fix. In the meantime (and pending any actual fix if it is concluded there is a real Smithy bug here) one might consider using the paths-ignore option to filter out paths that shouldn't be scanned, and the upstream project could recommend this as a CodeQL configuration snippet for anyone using CodeQL and incorporating the SDK in such as way that CodeQL doesn't automatically recognise it as third-party code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants