-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Uploading .json through API not working #1012
Comments
I found out that i have to end the upload in order to actually do something the upload, but im waiting 50 seconds before doing a get to list the status of files, and some files are getting canceled |
A few thoughts:
|
This is my code: def upload_file(self, zip_file_path):
file_name = zip_file_path.split("/")[-1]
with open(zip_file_path, "rb") as file:
file_content = file.read()
count = 0
uploaded = False
i = 0
while uploaded == False:
upload_job = self.create_upload_job()
upload_job_data = upload_job.json()
file_upload_job_id = upload_job_data['data']['id']
response_upload = self._requestZip(
method="POST",
uri=f"/api/v2/file-upload/{file_upload_job_id}",
body=file_content
)
print(f"{file_name} loaded. Waiting for data to be processed...")
response_end = self._request(
method="POST",
uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
)
print(f"Data processing started for {file_name}")
threshold = 20
sleep(threshold)
count += threshold
while True:
response_list = self._request(
method="GET",
uri=f"/api/v2/file-upload"
)
JSON_response = response_list.json()
if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][0]["status_message"] == "Complete":
print(f"Successfully uploaded {file_name}")
uploaded = True
break
if JSON_response['data'][i]['status'] == 3:
print("Data processing was canceled. Error on file: " + file_name)
i += 1
break
sleep(threshold)
count += threshold
With this code, it works, but at 2nd, try. I can stick to this because is a program that efficiency is not important as is run once a month. Thanks |
If you upload that same file via the UI, does it work fine there, or is it the same behavior? |
It works fine, its just by API, everything i upload, works at second try |
I asked one of our engineers to double-check this thread, and what you're doing appears to be correct. However, without additional logs or a view of the full code snippet you're running, it will be difficult for us to help you troubleshoot further. |
Sorry for the delay, i was on vacation, but this is this the log info on the docker logs: 2025-01-08 10:20:36 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:34.267421+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:33706","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"9f487a58-d9b5-439b-ad79-8f1eb9d7d426","request_bytes":135,"response_bytes":0,"status":204,"elapsed":55.816438,"time":"2025-01-08T09:20:36.403486091Z","message":"POST /api/v2/clear-database"} As you can see, there is 2 uploads try, but i dont find any info of why first try fails. |
def create_upload_job(self):
"""Creates a file upload job."""
response = self._request("POST", "/api/v2/file-upload/start")
return response
def upload_file(self, zip_file_path):
file_name = zip_file_path.split("/")[-1]
with open(zip_file_path, "rb") as file:
file_content = file.read()
uploaded = False
i = 0
while uploaded == False:
count = 0
upload_job = self.create_upload_job()
upload_job_data = upload_job.json()
file_upload_job_id = upload_job_data['data']['id']
response_upload = self._requestZip(
method="POST",
uri=f"/api/v2/file-upload/{file_upload_job_id}",
body=file_content
)
print(f"{file_name} loaded. Waiting for data to be processed...")
response_end = self._request(
method="POST",
uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
)
print(f"Data processing started for {file_name}")
threshold = 10
sleep(threshold)
count += threshold
while True:
print(f"Waiting for data to be processed... {count}s")
response_list = self._request(
method="GET",
uri=f"/api/v2/file-upload"
)
JSON_response = response_list.json()
if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][i]["status_message"] == "Complete" or JSON_response['data'][i]["status_message"] == "Analyzing":
print(f"Successfully uploaded {file_name}")
uploaded = True
break
if JSON_response['data'][i-1]["status_message"] == "Partially Completed" or JSON_response['data'][i-1]["status_message"] == "Complete" or JSON_response['data'][i-1]["status_message"] == "Analyzing":
print(f"Successfully uploaded {file_name}")
uploaded = True
break
if JSON_response['data'][i]['status'] == 3:
print("Data processing was canceled. Error on file: " + file_name)
i += 1
break
sleep(threshold)
count += threshold
if count > 300:
print("Data processed timeout on file: " + file_name)
break
def _request(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
# Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)
# OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
# modify the request method or URI. It is composed of concatenating the request method and the request URI with
# no delimiter and computing the HMAC digest using the token key as the digest secret.
#
# Example: GET /api/v1/test/resource HTTP/1.1
# Signature Component: GET/api/v1/test/resource
digester.update(f"{method}{uri}".encode())
# Update the digester for further chaining
digester = hmac.new(digester.digest(), None, hashlib.sha256)
# DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
# value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
# value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
# hours value forward:
#
# Example: 2020-12-01T23:59:60Z
# Signature Component: 2020-12-01T23
datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
digester.update(datetime_formatted[:13].encode())
# Update the digester for further chaining
digester = hmac.new(digester.digest(), None, hashlib.sha256)
# Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
# the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
# where there is no body content the HMAC digest is computed anyway, simply with no values written to the
# digester.
if body is not None:
digester.update(body)
# Perform the request with the signed and expected headers
return requests.request(
method=method,
url=self._format_url(uri),
headers={
"User-Agent": "bhe-python-sdk 0001",
"Authorization": f"bhesignature {self._credentials.token_id}",
"RequestDate": datetime_formatted,
"Signature": base64.b64encode(digester.digest()),
"Content-Type": "application/json",
},
data=body,
)
def _requestZip(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
# Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)
# OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
# modify the request method or URI. It is composed of concatenating the request method and the request URI with
# no delimiter and computing the HMAC digest using the token key as the digest secret.
#
# Example: GET /api/v1/test/resource HTTP/1.1
# Signature Component: GET/api/v1/test/resource
digester.update(f"{method}{uri}".encode())
# Update the digester for further chaining
digester = hmac.new(digester.digest(), None, hashlib.sha256)
# DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
# value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
# value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
# hours value forward:
#
# Example: 2020-12-01T23:59:60Z
# Signature Component: 2020-12-01T23
datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
digester.update(datetime_formatted[:13].encode())
# Update the digester for further chaining
digester = hmac.new(digester.digest(), None, hashlib.sha256)
# Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
# the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
# where there is no body content the HMAC digest is computed anyway, simply with no values written to the
# digester.
if body is not None:
digester.update(body)
# Perform the request with the signed and expected headers
return requests.request(
method=method,
url=self._format_url(uri),
headers={
"User-Agent": "bhe-python-sdk 0001",
"Authorization": f"bhesignature {self._credentials.token_id}",
"RequestDate": datetime_formatted,
"Signature": base64.b64encode(digester.digest()),
"Content-Type": "application/zip",
},
data=body,
) Also, this is the code im running |
Description:
Hey, im trying to upload .json through API and its giving me 202 as satus code, which means it works, but in the file ingest page, it alway stays as running, or sometime it changes to cancel, i tried uploading the same .json manually and it works fine, so it may be my code. At a first, i tried uploading a file with all the JSON, but it didnt work, so i started to mount every json manually, if the solution lets me upgrade only the zip it would be great
Are you intending to fix this bug?
"no"
Component(s) Affected:
Steps to Reproduce:
Expected Behavior:
I expect to actually upload the data correctly
Actual Behavior:
Having 202 status code, which means, data uploaded correctly, but this happens
Screenshots/Code Snippets/Sample Files:
Environment Information:
BloodHound: 6.3.0
Collector: [SharpHound version / AzureHound version]
OS: Windows 11
Browser (if UI related): [browser name and version]
Node.js (if UI related: [Node.js version]
Go (if API related): [Go version]
Database (if persistence related): [Neo4j version / PostgreSQL version]
Docker (if using Docker): 4.36.0
Additional Information:
Any additional context or information that might be helpful in understanding and diagnosing the issue.
Potential Solution (optional):
If you have any ideas about what might be causing the issue or how it could be fixed, you can share them here.
Related Issues:
If you've found related issues in the project's issue tracker, mention them here.
Contributor Checklist:
The text was updated successfully, but these errors were encountered: