Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tx client concurrency test #4104

Merged
merged 8 commits into from
Jan 17, 2025
Merged

fix: tx client concurrency test #4104

merged 8 commits into from
Jan 17, 2025

Conversation

cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Dec 10, 2024

This PR makes two small tweaks:

  • Fixes TestConcurrentTxSubmission by adding a capacity of 1 to the errCh. Currently errors were being ignored because the wait group meant that there wasn't a process to read to the channel as it was being written to. This fixes this
  • Catches the case where a user cancels the context when calling ConfirmTx

This test is broken until celestiaorg/celestia-core#1553 is resolved

@cmwaters cmwaters marked this pull request as ready for review December 20, 2024 11:23
@cmwaters cmwaters requested a review from a team as a code owner December 20, 2024 11:23
@cmwaters cmwaters requested review from rootulp and ninabarbakadze and removed request for a team December 20, 2024 11:23
Copy link
Contributor

coderabbitai bot commented Dec 20, 2024

📝 Walkthrough

Walkthrough

The pull request introduces modifications to two files in the pkg/user directory. In e2e_test.go, the TestConcurrentTxSubmission function is updated to iterate over multiple mempool versions and increase the number of transactions from 10 to 100, while modifying the error channel to be buffered. In tx_client.go, the ConfirmTx method adds a context cancellation check, and the handleEvictions method refines transaction sequence management during mempool evictions.

Changes

File Change Summary
pkg/user/e2e_test.go - Increased numTxs from 10 to 100
- Changed errCh to a buffered channel with capacity 1
- Iterated over multiple mempool versions in TestConcurrentTxSubmission
pkg/user/tx_client.go - Added context cancellation check in ConfirmTx method
- Updated handleEvictions to manage transaction sequence during evictions

Possibly related PRs

Suggested labels

WS: Maintenance 🔧, WS: V2 ✌️

Suggested reviewers

  • evan-forbes
  • rootulp
  • ninabarbakadze

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/user/e2e_test.go (1)

36-36: Consider parameterizing the number of transactions.

While increasing the test load is good for stress testing, consider making numTxs configurable through a test parameter or environment variable. This would allow for quick tests during development while maintaining the ability to run more extensive tests in CI.

-	numTxs := 100
+	numTxs := getTestTxCount()

// Add this helper function:
+func getTestTxCount() int {
+	if count, err := strconv.Atoi(os.Getenv("TEST_TX_COUNT")); err == nil {
+		return count
+	}
+	return 100 // default to 100
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 415e870 and 9f33f0b.

📒 Files selected for processing (2)
  • pkg/user/e2e_test.go (1 hunks)
  • pkg/user/tx_client.go (1 hunks)
🔇 Additional comments (2)
pkg/user/e2e_test.go (1)

42-42: Good fix: Buffered error channel prevents goroutine leaks.

The addition of buffer capacity to errCh is a crucial fix. Previously, if multiple errors occurred simultaneously, the error channel being unbuffered could cause goroutine leaks as the first error might not be read before subsequent errors tried to write to the channel.

pkg/user/tx_client.go (1)

466-468: Good addition: Proper context cancellation handling.

The addition of the context error check ensures that user-initiated cancellations are properly propagated instead of being masked by the "transaction not found" error. This maintains the correct error semantics and helps with debugging.

rootulp
rootulp previously approved these changes Dec 20, 2024
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestConcurrentTxSubmission fails in CI: https://github.com/celestiaorg/celestia-app/actions/runs/12430560032/job/34706233115?pr=4104#step:4:39

Is it a flake? I just retried

rach-id
rach-id previously approved these changes Dec 23, 2024
Copy link
Member

@ninabarbakadze ninabarbakadze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should prolly be resolved by now

@rootulp
Copy link
Collaborator

rootulp commented Jan 2, 2025

@cmwaters
Copy link
Contributor Author

cmwaters commented Jan 2, 2025

test is failing

Yup, will try investigate

@cmwaters cmwaters dismissed stale reviews from rootulp and rach-id via b1924ed January 2, 2025 20:45
@cmwaters
Copy link
Contributor Author

cmwaters commented Jan 2, 2025

Blocked on celestiaorg/celestia-core#1582

@cmwaters cmwaters marked this pull request as draft January 2, 2025 20:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/user/e2e_test.go (2)

42-44: Consider parameterizing the test data size

While increasing the number of transactions to 100 provides better coverage, consider making this configurable through a test parameter or constant to facilitate different load testing scenarios.

-			numTxs := 100
+			const defaultNumTxs = 100
+			numTxs := defaultNumTxs

70-75: Consider enhancing error reporting

The error handling is correct but could benefit from more detailed error reporting for debugging purposes.

 			select {
 			case err := <-errCh:
-				require.NoError(t, err)
+				require.NoError(t, err, "Failed during concurrent transaction submission: %v", err)
 			default:
 			}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4660490 and b1924ed.

📒 Files selected for processing (1)
  • pkg/user/e2e_test.go (3 hunks)
🔇 Additional comments (4)
pkg/user/e2e_test.go (4)

26-36: LGTM: Comprehensive mempool version testing setup

The test now properly covers all mempool versions with appropriate timeout configuration. The increased timeout commit of 10 seconds provides adequate time for transaction processing in concurrent scenarios.


38-40: LGTM: Clean client initialization

The test client initialization is properly error-handled and follows the standard pattern.


47-50: LGTM: Fixed error channel capacity

The buffered error channel with capacity 1 is a good fix. This ensures the first error can be captured without blocking, addressing the issue mentioned in the PR objectives where errors were being ignored.


52-68: Verify timeout handling in high-load scenarios

The one-minute timeout might need adjustment based on system performance under high load with 100 concurrent transactions.

Also, consider adding logging to track the duration of transaction processing:

 			go func(b *share.Blob) {
 				defer wg.Done()
+				start := time.Now()
 				_, err := txClient.SubmitPayForBlob(subCtx, []*share.Blob{b}, user.SetGasLimitAndGasPrice(500_000, appconsts.DefaultMinGasPrice))
+				if time.Since(start) > 30*time.Second {
+					t.Logf("Transaction took longer than 30s to process")
+				}
 				if err != nil && !errors.Is(err, context.Canceled) {
✅ Verification successful

Let me gather more information about the test file and its context.


One minute timeout is appropriate for concurrent transaction submission test

The one-minute timeout is well-justified for this test scenario because:

  1. The test submits 100 concurrent transactions with blobs of size 2048 bytes each
  2. The test node is configured with a 10-second block time (TimeoutCommit = 10 * time.Second)
  3. The test runs against three different mempool versions (v0, v1, v2)
  4. Each transaction has a gas limit of 500,000 which is sufficient for the blob size
  5. The timeout acts as a safety mechanism to prevent test hangs while allowing enough time for transaction processing across multiple blocks

The timeout duration provides an adequate buffer considering:

  • Block time (10s) × Multiple blocks needed for 100 txs
  • Network setup and initial height wait time
  • Transaction processing overhead across different mempool versions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any timeout-related issues in test logs
# Look for patterns indicating timeout issues or slow transaction processing

rg -A 3 "context deadline exceeded|timeout|took longer than expected" 

Length of output: 29896


Script:

#!/bin/bash
# Check the test file content to understand the test scenario
cat pkg/user/e2e_test.go

Length of output: 2167

@cmwaters cmwaters marked this pull request as ready for review January 16, 2025 08:33
@cmwaters cmwaters added the backport:v3.x PR will be backported automatically to the v3.x branch upon merging label Jan 16, 2025
@cmwaters cmwaters merged commit f21716b into main Jan 17, 2025
27 of 28 checks passed
@cmwaters cmwaters deleted the cal/concurrent_tx_client branch January 17, 2025 14:56
mergify bot pushed a commit that referenced this pull request Jan 17, 2025
This PR makes two small tweaks:
- Fixes `TestConcurrentTxSubmission` by adding a capacity of 1 to the
errCh. Currently errors were being ignored because the wait group meant
that there wasn't a process to read to the channel as it was being
written to. This fixes this
- Catches the case where a user cancels the context when calling
`ConfirmTx`

**This test is broken until
celestiaorg/celestia-core#1553 is resolved**

---------

Co-authored-by: nina / ნინა <[email protected]>
(cherry picked from commit f21716b)
rootulp pushed a commit that referenced this pull request Jan 17, 2025
This PR makes two small tweaks:
- Fixes `TestConcurrentTxSubmission` by adding a capacity of 1 to the
errCh. Currently errors were being ignored because the wait group meant
that there wasn't a process to read to the channel as it was being
written to. This fixes this
- Catches the case where a user cancels the context when calling
`ConfirmTx`

**This test is broken until
celestiaorg/celestia-core#1553 is
resolved**<hr>This is an automatic backport of pull request #4104 done
by [Mergify](https://mergify.com).

Co-authored-by: Callum Waters <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:v3.x PR will be backported automatically to the v3.x branch upon merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants