-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils/astring: add expected control characters #134
Draft
smitterl
wants to merge
1
commit into
avocado-framework:main
Choose a base branch
from
smitterl:133
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the problem with "c" (and actually with all the existing and added codes) is that we only assume they are matched from the beginning, but we don't force that. This means something like
\x1bthis-is-a-broken-text-c
would now be accepted and result in outputhis-is-a-broken-text-c
.Note this is not introduced by your change, it just makes it more visible (as matching a single "c" is quite probable).
The fix for this should be IMO to match from the beginning (especially since we erase characters from the beginning). So your fix should add
|^c|^!p|^]104
to ensure only\x1bc
/\x1b!p
and\x1b]104
is matched.As for the brackets, it's a regexp so
[c]
matches any character out of the brackets. At this point we don't expect any other standalone characters so I'd suggest just thec
without the brackets.Similarly the
[!p]
which actually matches only!
out of the!pfoo
string as it matches one of the characters inside the brackets. So that one really has to be without the brackets.The last one is actually generic and vendors are free to specify any
]\d+
strings so here I'd probably consider^]\d+
rather than one of the variants, what do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems even worse then I though. Astring blindly ignores many issues, I have fixed some of them and added a selftest here: #135 but IMO it'd be even better to simply replace it with a regexp to substitute all console codes with
""
instead of this custom handling.Anyway I understand that this code is here and the behaviour was odd for a long time. So the question is whether we really want to do something about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case my comments were not clear, I'm ready to ack this patch once your part of the regexp works properly, therefore something like
console_codes += "|^c|^!p|^]104
orconsole_codes += "|^c|^!p|^]\d+
. @clebergnu you had some concerns, would this work for you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this, thanks for the review, @ldoktor AFAICR @clebergnu 's concern was that we first confirm those are actually console codes. I did it in all cases IMO, just not sure what the OS operation 104 is supposed to mean. As you might have seen I have reached out to many people in the OS, kernel space and no answer so far. But if the test passes, it passes, and the control characters all start with \x1b so I think it's safe to assume we're not breaking 'normal strings'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either I'm misunderstanding something or your proposal doesn't work. With the indicated regex the algorithm raises at "!p".
The problematic line that I reported starts with "ESC c" but no line starts with "ESC !p". The control character doesn't imply that the lines start with these characters. We see the soft reset command isn't the first, so for the same right "c" doesn't need to be the first. Also, in this xterm reference there's no indicated it must be the first https://gist.github.com/justinmk/a5102f9a0c1810437885a04a07ef0a91 Maybe the algorithm works a bit different from what we read in the code? I read out the tmp_word. Its value changes as listed below.
However, my original regex works and it doesn't filter out 'c' if there's no ESC in front of it as we can see in terms such as "Connected" or "avocado" right in the first line.
I think we should merge this as-is and not accept #135 for the outlined reasons.
Please let me know if you disagree. Otherwise I hope we can merge this because there's no workaround besides this patch for my tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clebergnu IIUC it should be OSC command: https://en.wikipedia.org/wiki/ANSI_escape_code#OSC which are not strictly defined (which is why I suggested the
\d
to catch them all).As for this version I mentioned the brackets are unnecessary as you only define one or a few characters (it's regexp so the
[]
brackets means one of the characters.And the
c
itself sounds a bit too vague without the^
because it simply matches 1 character wheneverc
is present after the initial text. Take this as an example:aexpect.utils.astring.strip_console_codes("0\x1b[0m1\x1b2345\x1b67890123456789")
fails butaexpect.utils.astring.strip_console_codes("0\x1b[0m1\x1b23c45\x1b67890123456789")
succeeds and reports thec
as part of the output, stripping the2
simply because it cuts of the first character.So yes, the #135 should rapidly change the situation but even before that one (and I think your commit should go first) shouldn't introduce more of the odd behaviours.
As for the
!p
, thanks for sharing the snippet (would be nice to get the full original output). From there it looks like your getting\x1b[!p
and not just\x1b!p
, which is why my regexp does not work. In that case it should be|^c|^\\[!p|^]104
or (IMO better)|^c|^\\[!p|^]\d+
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I understand @ldoktor concern. @smitterl, would you be able to add #135 to your test with your reproducer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clebergnu with #135 as-is the error just reproduces. But I ran another test that uses the console and that one passes. I'm trying to figure out now what the right expression should be according to #135,
|^c|^\\[!p|^]\d+
doesn't work. Python itself complains about the invalid escape sequence\d
but|^c|^\\[!p|^]\\d+
doesn't work either.I am still not sure I understand why the "^", I have not understood the algorithm yet. I hope to have a moment for this soon.