Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate next_etrago_id collisions #927

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from

Conversation

gnn
Copy link
Collaborator

@gnn gnn commented Sep 7, 2022

Don't know whether there are any open issues which this PR solves, but it should help with next_etrago_id collisions by generating IDs randomly in a manner which is relatively unlikely to generate collisions.

Before merging into dev-branch, please make sure that

  • the CHANGELOG.rst was updated.
  • new and adjusted code is formated using black and isort.
  • the Dataset-version is updated when existing datasets are adjusted.
  • the branch was merged into the
    continuous-integration/run-everything-over-the-weekend-branch.
  • the workflow is running successful in test mode.
  • the workflow is running successful in Everything mode.

None of the above has happened yet, but I'll do so before the PR gets merged. I'll also assign reviewers, but since it's better to assign individual reviewers to specific commits, I'll do that later, too.

gnn added 16 commits September 7, 2022 10:47
Done simply via `pre-commit autoupdate`. Should fix all pre-commit
issues.
Otherwise it would complain with "E722 do not use bare 'except'".
I did a few experiments and using six bytes of the UUID would've been OK
but risky, but using seven bytes didn't lead to collisions for up to
around 1.6e+10 UUIDs in around 3e+5 contiguous ranges with random
lengths from 1e+4 to 1e+5. Using eight bytes would mean the risk of
generating integers which are to big for PostgreSQL's BigInt datatype,
so seven bytes is the best compromise between simplicity and collision
mitigation.
This is due to a "flake8" complaint, but it also makes the code more
explicit. The intention is to provide an alternative, should the
`dataset_2040[node]` lookup fail. Explicitly catching only the
`KeyError` caused by this avoids potentially hiding other errors.
There's still a bug in this code, as explained in issue #926, but that's
something which has to be fixed separately.
Most of these are done automatically via "isort" or "black". The
exceptions are:

  - fixing the line length of SQL strings and comments,
  - removing trailing whitespace in SQL strings and
  - removing the unused local variable `sources`.

These things had to be done manually to ensure they really don't change
the semantics of the code.
There's some interesting stuff in here. First, some of the lines, the
ones within the `# fmt: on`/`# fmt: off` blocks, could not be formatted
by "black" due to psf/black#510. Second, I had to wrestle with "isort"
quite a bit because it kept formatting the imports to be too long so I
had to resort to the `# noqa: E501` solution. Last but not least, if an
import is only used in a docstring, "flake8" doesn't seem to recognize
it as used.
Since the IDs are randomly generated now, there's no reason to increment
them anymore. Both variants are (roughly) equally likely to collide with
IDs already in use.
Mostly overly long lines but also one missing space after an inline
comment (`#`) sign and one f-string without placeholders.
The result of `next_etrago_id` is now always an `int` so no need to cast
it.
At least where possible. I also took the liberty of removing comments
like `# Select next id value` because these just restate the name of the
function in other words so one could argue that they should fall into a
category of comments which should be avoided according to [PEP8][0].

[0]: https://peps.python.org/pep-0008/#inline-comments
As requested by "flake8".
Since we now generate IDs randomly, they are no longer "guaranteed" to
be the current maximum so the old name is misleading.
The respective code blocks kind of belong together and having them
closer to each other makes this a little bit more obvious. See also the
[PEP8 note][0] on using "blank lines [..] sparingly".

[0]: https://peps.python.org/pep-0008/#blank-lines
Calling `next_etrago_id` two times in "close" succession did not
guarantee getting the same results even before switching to random IDs.
This was due to concurrency. Now that we're using random IDs, these two
calls are basically guaranteed to yield wildly different results (which
is the point of the new implementation), so this pattern is very
dangerous and will nearly always be wrong.
Fortunately it is easily fixed by saving the result of the first call
and using it in place of both calls.
Turns out that it's not possible to import `MotorizedIndividualTravel`
because it leads to a circular import. Solution:

```
import egon.data.datasets.emobility.motorized_individual_travel
```

instead and pull `MotorizedIndividualTravel` out of it at use time.
hooks:
- id: flake8
- repo: https://github.com/psf/black
rev: 20.8b1
rev: 22.8.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this relate to #770?
Please checkout @khelfen's proposal on this #811

(Somewhat) related: #807

@nesnoj
Copy link
Member

nesnoj commented Oct 24, 2022

@gnn You said last week it ran successfully in test mode? Does this PR contain the latest code?
I'd like to give it a try for DE on a RLI instance, do you think it's mature enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants