Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discuss] Transaction API's autocommit #1253

Open
kevinjqliu opened this issue Oct 27, 2024 · 4 comments · May be fixed by #1497
Open

[discuss] Transaction API's autocommit #1253

kevinjqliu opened this issue Oct 27, 2024 · 4 comments · May be fixed by #1497

Comments

@kevinjqliu
Copy link
Contributor

kevinjqliu commented Oct 27, 2024

Porting over from #1246

Can be a potential footgun (#1246 (comment))
Autocommit usage (#1246 (comment))

However, there may still be some concerns around this since Transaction is a public class. If this is the case, I think we can start from making the parameter "private" (autocommit -> _autocommit) and/or adding some doc to explain the usage.

@kevinjqliu
Copy link
Contributor Author

The idea is to make the code simpler if we only want to evolve schema/spec/...
i.e.

with table.update_schema() as update:
    update.add_column("some_field", IntegerType(), "doc")

instead of another with..transaction wrapper

with table.transaction() as transaction:
    with transaction.update_schema() as update_schema:
        update.add_column("some_other_field", IntegerType(), "doc")

....

What about moving the autocommit logic out of Transaction and into the class that uses it instead?

For example, UpdateSchema can implement __enter__ and __exit__ to commit the transaction automatically?

Currently, these classes use autocommit=True:

  • ManageSnapshots
  • UpdateSchema
  • UpdateSpec

@jiakai-li
Copy link
Contributor

jiakai-li commented Jan 10, 2025

Just raise another possibility for discussion. We could also modify the _transaction._autocommit when calling UpdateSchema.__enter__ method, which indicates multiple updates is coming and we would like to avoid the autocommit. Something like:

class UpdateSchema(UpdateTableMetadata["UpdateSchema"]):
    ...
    def __enter__(self):
        self._transaction._autocommit = False
        return self
    ...

The reason behind my head is, to my understanding, autocommit is still somewhat a related concept with transaction.

@jiakai-li
Copy link
Contributor

jiakai-li commented Jan 10, 2025

Adding an example for the issue:

from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import IntegerType, NestedField, StringType

WAREHOUSE_PATH = "/tmp/warehouse"
catalog = SqlCatalog(
    "default",
    uri=f"sqlite:///{WAREHOUSE_PATH}/pyiceberg_catalog.db", warehouse=f"file://{WAREHOUSE_PATH}",
)
catalog.create_namespace_if_not_exists("default")

try:
    catalog.drop_table("default.test")
except:
    pass

schema = Schema(
    NestedField(1, "field1", StringType(), required=False)
)

table = catalog.create_table("default.test", schema)

with table.update_schema() as update:
    update.add_column("field_should_not_exist", IntegerType())  # <--- This field should not be committed
    raise Exception("Error!")

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Jan 11, 2025

The reason behind my head is, to my understanding, autocommit is still somewhat a related concept with transaction.

I think its fair to ask if we consider UpdateSchema to be a transaction. If so, we cannot allow autocommit since that'll break the atomicity of the transaction.

We could also modify the _transaction._autocommit when calling UpdateSchema.enter method, which indicates multiple updates is coming and we would like to avoid the autocommit. Something like:

I feel like in this case, we can just get rid of autocommit entirely

cc @Fokko / @HonahX from the original conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants