Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/preserve comment only files #200

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ repos:
# Class cannot subclass "ConfigSchema" (has type "Any")
- --allow-subclassing-any
- repo: https://github.com/flakeheaven/flakeheaven
rev: 0.11.1
rev: 3.2.1
hooks:
- name: Run flakeheaven static analysis tool
id: flakeheaven
files: ^(src|tests)/.+?\.(ipynb|md|py|rst|yaml|yml)$
12 changes: 12 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,18 @@ export YAMLFIX_CONFIG_PATH="/etc/yamlfix/"

Configure the base config-path that `maison` will look for a `pyproject.toml` configuration file. This path is traversed upwards until such a file is found.

### Document Fix ID

Default: `document_fix_id: str = uuid.uuid4().hex`<br>
Environment variable override:
```bash
export YAMLFIX_DOCUMENT_FIX_ID="myid"
```

Warning: You should not modify this value, if you're not sure you need to.

This generates a UUID in hex-string-representation (no delimiters), which is used internally to generate a temporary top-level mapping node, where any lists or comments can be attached, so ruyaml is not removing comments, and comment-only documents can be formatted properly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is something a user would want to change? I don't feel that without knowing the internals of yamlfix they can even understand the paragraph.

If you feel it's interesting to keep the configuration, I'd move it to the bottom of the document, otherwise I'd just hardcode the value

### Explicit Document Start

Default: `explicit_start: bool = True`<br>
Expand Down
42 changes: 42 additions & 0 deletions src/yamlfix/adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,13 @@ def patch_sequence_style(key_node: Node, value_node: Node) -> None:
if not sequence_node.value:
return

# if this key_node is the `yamlfix_document_fix_id` node,
# the sequence_node is the top-level list, which has to be forced
# into block-mode, so the key_node can be removed afterwards
force_block_style = force_block_style or self._seq_is_in_top_level_node(
key_node
)

# if this sequence contains non-scalar nodes (i.e. dicts, lists, etc.),
# force block-style
force_block_style = (
Expand Down Expand Up @@ -252,6 +259,9 @@ def patch_sequence_style(key_node: Node, value_node: Node) -> None:

self.patch_functions.append(patch_sequence_style)

def _seq_is_in_top_level_node(self, key_node: Node) -> bool:
return str(key_node.value) == f"yamlfix_{self.config.document_fix_id}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the question above, can't we just use yamlfix_top_node?


@staticmethod
def _seq_contains_non_scalar_nodes(seq_node: Node) -> bool:
return any(not isinstance(node, ScalarNode) for node in seq_node.value)
Expand Down Expand Up @@ -342,9 +352,11 @@ def fix(self, source_code: str) -> str:
log.debug("Running source code fixers...")

fixers = [
self._fix_comment_only_files,
self._fix_truthy_strings,
self._fix_jinja_variables,
self._ruamel_yaml_fixer,
self._restore_comment_only_files,
self._restore_truthy_strings,
self._restore_jinja_variables,
self._restore_double_exclamations,
Expand Down Expand Up @@ -693,3 +705,33 @@ def _restore_jinja_variables(source_code: str) -> str:
fixed_source_lines.append(line)

return "\n".join(fixed_source_lines)

def _fix_comment_only_files(self, source_code: str) -> str:
"""Add a mapping key with an id to the start of the document\
to preserve comments."""
fixed_source_lines = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're running _fix_comment_only_files and _restore_comment_only_files by default, which slows down the program as it needs to iterate through the source code 2 times.

I'd only run these functions if the config is set.

yamlfix_document_id_line = f"yamlfix_{self.config.document_fix_id}:"

# Add the document id line after each document start
has_start_indicator = False
for line in source_code.splitlines():
fixed_source_lines.append(line)
if line.startswith("---"):
has_start_indicator = True
fixed_source_lines.append(yamlfix_document_id_line)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons it would be nice if we could stop the iteration on the file source as soon as we've done the required changes. Same applies at the _restore_comment_only_files.

Maybe reformulate it so that once the condition is met it injects the required line and exits.

Now that I see it, although this is a good idea, it may need a non-trivial refactor where the fixers tweak directly the parts of the code from self. instead of receiving the source as an argument and returning as a return value.

If you don't feel like doing this now we can open an issue to track it for the future


# if the document has no start indicator, the document id as the first line
if not has_start_indicator:
fixed_source_lines.insert(0, yamlfix_document_id_line)

return "\n".join(fixed_source_lines)

def _restore_comment_only_files(self, source_code: str) -> str:
"""Remove the document start id from the document again."""
fixed_source_lines = []

for line in source_code.splitlines():
if self.config.document_fix_id not in line:
fixed_source_lines.append(line)

return "\n".join(fixed_source_lines)
2 changes: 2 additions & 0 deletions src/yamlfix/model.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Define program entities like configuration value entities."""

import uuid
from typing import Optional

from maison.schema import ConfigSchema
Expand All @@ -12,6 +13,7 @@ class YamlfixConfig(ConfigSchema):
comments_min_spaces_from_content: int = 2
comments_require_starting_space: bool = True
config_path: Optional[str] = None
document_fix_id: str = uuid.uuid4().hex
explicit_start: bool = True
flow_style_sequence: Optional[bool] = True
indent_mapping: int = 2
Expand Down
45 changes: 45 additions & 0 deletions tests/unit/test_services.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,51 @@ def test_fix_code_preserves_comments(self) -> None:

assert result == source

def test_fix_code_preserves_comments_without_start_indication(self) -> None:
"""Don't delete comments without yaml explicit start indictor."""
source = dedent(
"""\
# Keep comments!
program: yamlfix
"""
)
config = YamlfixConfig()
config.explicit_start = False

result = fix_code(source, config)

assert result == source

def test_fix_code_preserves_comment_only_file(self) -> None:
"""Don't delete comments even if the file is only comments."""
source = dedent(
"""\
---
# Keep comments!
"""
)

result = fix_code(source)

assert result == source

def test_fix_code_preserves_comment_only_files_without_start_indication(
self,
) -> None:
"""Don't delete comments even if the file is only comments, without\
start indication."""
source = dedent(
"""\
# Keep comments!
"""
)
config = YamlfixConfig()
config.explicit_start = False

result = fix_code(source, config)

assert result == source

def test_fix_code_respects_parent_lists_with_comments(self) -> None:
"""Do not indent lists at the first level even if there is a comment."""
source = dedent(
Expand Down