Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate impact of Mocha spam #4230

Open
rootulp opened this issue Jan 15, 2025 · 8 comments
Open

Investigate impact of Mocha spam #4230

rootulp opened this issue Jan 15, 2025 · 8 comments
Assignees
Labels
investigation item tracks efforts related to an investigation. does not always require a PR to close.

Comments

@rootulp
Copy link
Collaborator

rootulp commented Jan 15, 2025

Context

I spammed Mocha last night #4212 (comment)

Problem

Some node operators reported issues on Mocha last night

Proposal

cc: @evan-forbes did we ever perform similar txsim on Arabica after bumping to 8 MiB? Any similar symptoms from network tests?

@rootulp rootulp added the investigation item tracks efforts related to an investigation. does not always require a PR to close. label Jan 15, 2025
@rootulp rootulp self-assigned this Jan 15, 2025
@evan-forbes
Copy link
Member

we did spam arabica extensively and had some issues, but they were limited to the cluster being configured incorrectly where each node only had access to small amounts of ram.

We had >7MB blocks averaging 5s, which is to be expected on a local 4 node network

@mindstyle85
Copy link

adding some logs from our RPC here (which seems not to be working ok anymore after that for people trying to use it for blobs):

Prior to restart of the node, lots of these:

Jan 15 14:06:12 rpc01-tia-t celestia-appd[14083]: 2:06PM INF Can't write response (slow client) err="connection was stopped" module=rpc subscriptionID=5806 to=65.109.121.176:52974
Jan 15 14:06:12 rpc01-tia-t celestia-appd[14083]: 2:06PM INF Can't write response (slow client) err="connection was stopped" module=rpc subscriptionID=7030 to=65.109.121.176:40764
Jan 15 14:06:12 rpc01-tia-t celestia-appd[14083]: 2:06PM INF Can't write response (slow client) err="connection was stopped" module=rpc subscriptionID=4396 to=51.222.104.211:37906
Jan 15 14:06:12 rpc01-tia-t celestia-appd[14083]: 2:06PM INF Can't write response (slow client) err="connection was stopped" module=rpc subscriptionID=3954 to=82.60.123.231:38472

after i restarted, same amount of msgs but theyre now these:

Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43524: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43582: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43480: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43464: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43566: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43512: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43534: write: broken pipe" module=rpc-server
Jan 15 15:50:09 rpc01-tia-t celestia-appd[20461]: 3:50PM ERR failed to write responses err="write tcp 65.109.16.220:26657->216.18.205.34:43596: write: broken pipe" module=rpc-server

additional info:

in app.toml we have set


# MaxRecvMsgSize defines the max message size in bytes the server can receive.
# The default value is 10MB.
max-recv-msg-size = "20971520"

# MaxSendMsgSize defines the max message size in bytes the server can send.
# The default value is math.MaxInt32.
max-send-msg-size = "2147483647"

HW specs:

ryzen 7700x (8 cores) and 64gb ram

@rootulp
Copy link
Collaborator Author

rootulp commented Jan 15, 2025

Investigate the Discord reports

I found 3 reports:

@rootulp
Copy link
Collaborator Author

rootulp commented Jan 15, 2025

According to https://celestia-tools.brightlystake.com/ the Mocha endpoints are healthy with the exception of http://celestia-t-rpc.noders.services

@rootulp
Copy link
Collaborator Author

rootulp commented Jan 15, 2025

@mindstyle85 do you have system metrics (CPU, RAM, Network I/O) from the POPs consensus node?

@mindstyle85
Copy link

here you go:

image image1 image3 image4 image5

@rootulp
Copy link
Collaborator Author

rootulp commented Jan 15, 2025

Thanks, super helpful!

  1. RAM and CPU were fine
  2. Network traffic hit sustained 1 Gb/s during the time I was running txsim
  3. Disk space used came extremely close to 100% in /. I wonder if it cleared after a restart of the node or you manually deleted something?

Everything else seems fine. My hypothesis is that your server was maxing out 1 Gb/s network due to txsim and wasn't able to service any other inbound GRPC requests.

@mindstyle85
Copy link

ah, the disk was just the OS disk filling up due to logs, but we cleaned that.. the actual disk with db still has about 50% space left

not sure if its properly working yet though, i still see the broken pipe error messages in the logs so it looks like it hasnt fully recovered yet or something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation item tracks efforts related to an investigation. does not always require a PR to close.
Projects
None yet
Development

No branches or pull requests

3 participants