Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command #6280

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 15, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Previously, we set .blocking = True for all .timeout = assignment -- this basically turn EVERY command with hard timeout = 120 sec (default value) -- and the soft timeout were not correctly enabled.

In this PR, we:

  • add two methods add_default_timeout and add_hard_timeout to set timeout better.
  • replace existing implementation of .timeout accordingly

Another big issue before is that when agent:

  1. Runs a long command
  2. That long command somehow get stuck (and exceed 120 sec timeout)
  3. The agent tries to run the next (unrelated) command (e.g., ls) -- Because the previous command is NOT killed, the follow-up command will be stuck in the shell and not get executed anymore.

To fix this in the PR, we add an error message to remind the agent to kill the previous command properly before continuing.

metadata.suffix = (
    f'\n[Your command "{command}" is NOT executed. '
    f'The previous command was timed out but still running. Above is the output of the previous command. '
    "You may wait longer to see additional output of the previous command by sending empty command '', "
    'send other commands to interact with the current process, '
    'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
)

We also add a new test to stress test the bash terminal in loop for:

  1. Long command output
  2. Command that triggers soft timeout
  3. Command that triggers long timeout

Link of any specific issues this addresses

#6259

#6218


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:818bfde-nikolaik   --name openhands-app-818bfde   docker.all-hands.dev/all-hands-ai/openhands:818bfde

@xingyaoww xingyaoww changed the title add bash stress test to debug for #6259 [WIP] fix: bash performance issue Jan 15, 2025
@xingyaoww xingyaoww changed the title [WIP] fix: bash performance issue fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command Jan 15, 2025
@xingyaoww xingyaoww marked this pull request as ready for review January 15, 2025 20:54
last_pane_output, _ps1_matches
)
metadata = CmdOutputMetadata() # No metadata available
metadata.suffix = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of a hack 🙈

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the agent get confused if it really did enter y as stdin?

Maybe we need separate actions for running commands and sending stdin

Copy link
Collaborator Author

@xingyaoww xingyaoww Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are gonna ask the agent to kill commands when they want.. this is probably the easiest thing we can do..

We can also kill the command for the agent (like what we did before), but it kinda adds more constraints in the system

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! the agent is able to do other stuff (not get confused) when "y" is entered
image

xingyaoww and others added 5 commits January 15, 2025 16:10
- Create new MetadataTable component for better metadata display
- Update Terminal component to use MetadataTable for JSON metadata
- Fix linting in chat-slice.ts
- Remove MetadataTable component
- Update use-terminal hook to format metadata directly in terminal output
- Clean up terminal component
"You may wait longer to see additional output of the previous command by sending empty command '', "
'send other commands to interact with the current process, '
'or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding CmdOutputMetadata is a fairly complex BaseModel object that maps the output of ps1, but here we alter its structure and give it a different content, a rather large message for the LLM from us? (a prompt tweak)

Could we think about structuring this situation in some other way? Like, maybe don't save it in the action, and add an attribute to the CmdOutputObservation... 🤔 "instruction", or "error_detail" or "timeout_detail". Idk, but this is an Obs to the new action, and yet it contains deep buried info about the old action? If so, maybe we can surface it, make it super-clear in the obs

Copy link
Collaborator Author

@xingyaoww xingyaoww Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think these are really the info that we should show the user. @rbren had concerns early about directly displaying these in terminal so they should not go into .content, but maybe it make sense to move these suffix/and prefix to the CmdOutputObservation level of info

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think maybe a slightly different perspective is from a client developer / agent developer point of view. How do we define metadata and how easy is it for people to work with it for their purposes?
(I'm not sure why we call it metadata, if it's terminal output, maybe it would be easier to understand if it was, dunno, terminal_output. 😅)

- Create new MetadataSection component for collapsible metadata display
- Update Message type to include metadata field
- Update chat-slice.ts to store metadata separately from content
- Update ChatMessage and Messages components to handle metadata
- Remove separate MetadataSection component
- Update ExpandableMessage to handle metadata display
- Clean up ChatMessage component
- Improve metadata styling with border and spacing
@All-Hands-AI All-Hands-AI deleted a comment from baxitfund Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants