Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnowflakeConnectionV1#downloadStream has the chance of downloading the wrong file if files only differ by suffix. #2030

Open
wukachn opened this issue Jan 14, 2025 · 1 comment
Assignees
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@wukachn
Copy link

wukachn commented Jan 14, 2025

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?
    3.21.0 but seems to be introduced in 3.19.1 by this PR: SNOW-1708304: Download stream from git repository #1920

  2. What operating system and processor architecture are you using?
    macOS ARM

  3. What version of Java are you using?
    17

  4. What did you do?

I uploaded two files to my stage (not wrapped in quotes):

  • filename goes here
  • filename goes here.dat

Then, when I try to download filename goes here, I get the file content of filename goes here.dat.

Below is the code snippet that assumes sourceFiles should only ever contain a single filename when we call downloadStream. However, sourceFiles is determined by applying some kind of parsing process to the CMD generated earlier (get '@"DPC_STREAMING_TESTING_DB"."CDC_IT_f3333863-c08f-4c9a-9dd2-6cd3c25465d2"."SNOWFLAKE_CLIENT_IT"/filename goes here' file:///tmp/ /*jdbc download stream*/). In this example, sourceFiles gets set to a list of both file names and in this case the list is ordered in a way not in my favor, so the wrong file is downloaded.

// when downloading files as stream there should be only one file in source files
String sourceLocation =
sourceFiles.stream()
.findFirst()
.orElseThrow(
() ->
new SnowflakeSQLException(
queryID,
SqlState.NO_DATA,
ErrorCode.FILE_NOT_FOUND.getMessageCode(),
session,
"File not found: " + fileName));```

File names sharing a name besides a prefix do not encounter this issue but filenames sharing a name besides a suffix do.

  1. What did you expect to see?

    I would expect the correct file to be downloaded.

@wukachn wukachn added the bug label Jan 14, 2025
@wukachn wukachn changed the title SnowflakeConnectionV1#downloadStream have the chance of downloading the wrong file if files only differ by suffix. SnowflakeConnectionV1#downloadStream has the chance of downloading the wrong file if files only differ by suffix. Jan 14, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Jan 15, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team and removed bug labels Jan 15, 2025
@sfc-gh-dszmolka
Copy link
Contributor

hi there and thanks for submitting this issue ! i think you're observing the expected and documented behaviour. Why? the Snowflake GET command which the driver is using under the hood, per its documentation: https://docs.snowflake.com/en/sql-reference/sql/get

GET internalStage file://<local_directory_path>
..
internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>]
  | @[<namespace>.]%<table_name>[/<path>]
  | @~[/<path>]

<path> is an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits access to a set of files. Paths are alternatively called prefixes or folders by different cloud storage services. If path is specified, but no file is explicitly named in the path, all data files in the path are downloaded.

This is precisely what is happening here.

I think as it is the expected behaviour, considering a different approach

  • (you seem to be already using that) naming the files differently
  • placing the files under different <path>

might help you. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants