Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During incremental copy, the last row is always updated #104

Open
timmysuh opened this issue Sep 22, 2020 · 6 comments
Open

During incremental copy, the last row is always updated #104

timmysuh opened this issue Sep 22, 2020 · 6 comments

Comments

@timmysuh
Copy link

I cannot think of a reason why we have ">=" instead of just ">" for INCREMENTAL replication logic?

It seems that we always end up updating the last row. Why is this a desired behavior?

if replication_key_value:
select_sql = """SELECT {}
FROM {}
WHERE {} >= '{}'::{}
ORDER BY {} ASC""".format(','.join(escaped_columns),
post_db.fully_qualified_table_name(schema_name, stream['table_name']),
post_db.prepare_columns_sql(replication_key), replication_key_value, replication_key_sql_datatype,
post_db.prepare_columns_sql(replication_key))

@ingcrengifo
Copy link

I have the same doubt.

@tmonks
Copy link

tmonks commented Oct 18, 2020

I agree, it should be ">". I've tested this change and am no longer getting the duplicates. Hoping we can get this change made to master.

tmonks added a commit to tmonks/tap-postgres that referenced this issue Oct 19, 2020
- [Issue 104](singer-io#104)
- Incremental sync needs to look for records '>', not '>=' state
@NicolasRisi
Copy link

NicolasRisi commented Feb 6, 2021

In fact, it's normal to have ">=" to avoid loss of data.
If 2 row are written on disk sequentially but have the same "updated_at" value, and the extraction is made between the 2 writes, you will lose data.

@tmonks
Copy link

tmonks commented Feb 8, 2021

What's the best way to avoid duplicated rows then? I haven't run into this issue with any of the other taps I've worked with.

@NicolasRisi
Copy link

The target must manage that and upsert or merge the New data.

@bback99
Copy link

bback99 commented Feb 11, 2021

@NicolasRisi Any recommends when original table doesn't have the PK but target has PK?

so WHERE {} >= '{}'::{}

this makes duplicated key for the last row as @timmysuh mentioned when rerun the query in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants