No inferences found by Textract #87

nekogeko · 2025-01-13T20:58:28Z

Describe the bug
When attempting to process the sample documents packaged in the solution, files are uploaded but no raw data is visible after a file has been processed. Key/value pairs are visible

To Reproduce
Deploy the application using one of the provided workflows (single-doc-textract.json, default.json), create a case and upload the management.png document with a document type (generic or Passport, depending on the choosen workflow config). Manually start the job. Wait for the status of the case to show that processing is complete. Then go to the playground view to see the extracted data.

Expected behavior
Since Textract is in the selected workflow, data is expected to be displayed in the Raw data section.

Please complete the following information about the solution:

[1.1.7 ] Version
[ us-east-1] Region: [e.g. us-east-1]
[ no] Was the solution modified from the version published on this repository?
[ no] If the answer to the previous question was yes, are the changes available on GitHub?
[ no] Have you checked your service quotas for the sevices this solution uses?
[ no errors seen] Were there any errors in the CloudWatch Logs?

Screenshots
If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context
Add any other context about the problem here.

knihit · 2025-01-13T22:54:36Z

@nekogeko thank you for reaching out. Can you please confirm if you are uploading the documents to process in the S3 bucket directly or in the UI?

A few things to check in addition to the above

there are few step functions that the solution creates. Can you please check if all them execute with no failures.
there is a lambda function https://github.com/aws-solutions/enhanced-document-understanding-on-aws/tree/main/source/lambda/text-extract. Can you check the CW logs for this lambda to see if it has any errors? For the lambda you can also set LOG_LEVEL as DEBUG as one of the lambda environment variables to view more verbose logging.

nekogeko · 2025-01-14T17:20:08Z

Hi,

I'm uploading the document from the UI. I've been testing with management.png, document type generic.

The cloudwatch logs for the textract lambda does not show errors

2025-01-14T16:02:07.501Z INIT_START Runtime Version: nodejs:20.v51 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:cb6527bfb6726a080a367eca00e49765ca5abd8cd1a17783fbee683313121ece 2025-01-14T16:02:08.575Z START RequestId: 2b14993e-a95f-5b93-8bdc-f93e25e1973c Version: $LATEST 2025-01-14T16:02:08.577Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c DEBUG S3_MULTI_PAGE_PDF_PREFIX is: multi-page-pdf 2025-01-14T16:02:09.585Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO namespace received: Workflows 2025-01-14T16:02:12.824Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO Publishing cw metrics with params: {"MetricData":[{"MetricName":"TextractWorkflow","Dimensions":[{"Name":"TextractAPI","Value":"Textract-DetectTextSync"},{"Name":"serviceName","Value":"eDUS-2cb98cb7"}],"Timestamp":"2025-01-14T16:02:12.824Z","Unit":"Count","Value":1}],"Namespace":"Workflows"} 2025-01-14T16:02:13.264Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO Published cw metrics to Workflows. 2025-01-14T16:02:13.264Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c DEBUG Textract Sync - Processing Generic Document Type for taskToken AQDEAAAAKgAAAAMAAAAAAAAAAQVEHY50GiW8OK+LNCCV9FGoLbR416Vy9Lf968CemDKwGgycSwFibsDQ9q8Ctz9unGOr7GXgNjlp2CRqsnM5gM8TgsaqOnv/nlZ+6SeCSPkJpGFzEexM8IE+2EaZgCR/2f6pzsulspjiLQs6uqVGspOdo7FHtbReh4T6RVskBT8smA==Mp2FJq+z6+LIujWTpZttI5fS+JbeFdWy98NVMCKU+9gefMO/9JgU3VoihOINvLCWqMIlk5A6nkKL5pqUqQWSDhdm9P5rnS+KkLBlOd0SmLNObzxTM6/FLWcM4T5cis0xy9Z7vyNNMs5eGN9Ov23oHb1Cd2BoJad4rLK1eKikOBEHY9XBQBqoeWt+7q9Bti+JntSU5PHC68Zb0Kw/qhNJVCr0eoNiDtwtLm0SUD+1CJTbRLTwt4XddvR2ZIgrzPqOR7YmhNp65Mcm9qvy4B5yCkw7zH9sGylvkeUgEo49LnVXnRbStkI5TP7pYz+P2WVgg3jJ2VYkcTu7b0E62UVxTWvHVnKNyPfm7ramyYrTEa1PaQrqX94qiv3PXeBrali523y8OefYFp2YwFQbAyw2plF5vXS2+jXOqREWwW/mS1IG5gpKn0jUaxqz2B/Ak468pT0tRTKVzVE8TS/aMEFYOcXNWooezg6YPH8vcSui0/n197waUqurceGyHceDh+SAyHRCvgAH6yGtEamYv6L4 2025-01-14T16:02:13.264Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO Textract AnalyzeDocument request parameters missing or invalid. Using defaults 2025-01-14T16:02:13.265Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO namespace received: Workflows 2025-01-14T16:02:17.504Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO Publishing cw metrics with params: {"MetricData":[{"MetricName":"TextractWorkflow","Dimensions":[{"Name":"TextractAPI","Value":"Textract-AnalyzeDocumentSync"},{"Name":"serviceName","Value":"eDUS-2cb98cb7"}],"Timestamp":"2025-01-14T16:02:17.504Z","Unit":"Count","Value":1}],"Namespace":"Workflows"} 2025-01-14T16:02:17.704Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c INFO Published cw metrics to Workflows. 2025-01-14T16:02:18.485Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c DEBUG S3_INFERENCE_BUCKET_NAME is: docunderstanding-requestprocessorinferences13166f8-pa8r00l6y4cg 2025-01-14T16:02:19.385Z 2b14993e-a95f-5b93-8bdc-f93e25e1973c DEBUG CASE_DDB_TABLE_NAME is: DocUnderstanding-RequestProcessorCaseManagerCreateRecordsLambdaDDbDynamoTable94F42CFC-1E7F3H9RWF15 END RequestId: 2b14993e-a95f-5b93-8bdc-f93e25e1973c REPORT RequestId: 2b14993e-a95f-5b93-8bdc-f93e25e1973c Duration: 11168.38 ms Billed Duration: 11169 ms Memory Size: 128 MB Max Memory Used: 103 MB Init Duration: 1070.70 ms XRAY TraceId: 1-67868a7f-80b8fb5a3b725a63520242f0 SegmentId: f7fb34c2aceeda5c Sampled: true

The entry in DynamoDB has a case status set to success

I see inferences being created in S3 under a folder hierarchy that looks like

docunderstanding-requestprocessorinferences13166f8-pa8r00l6y4cg
    /nekogeko:f40fa734-dda2-45b7-b365-ddecbbd0bd4d
        /doc-d915c25d-f274-42b6-a96e-233072068ee4

            textract-detectText.json

alongside this file are also other files including entity-*.json, and textract-analyze.json
The 2 textract files have data in them, with node elements such as
{ "BlockType": "WORD", "Confidence": 96.8800277709961, "Text": "REPORT", "TextType": "PRINTED", "Geometry": { "BoundingBox": { "Width": 0.06701218336820602, "Height": 0.007605433464050293, "Left": 0.5365458130836487, "Top": 0.9628257751464844 }, "Polygon": [ { "X": 0.5365458130836487, "Y": 0.9628257751464844 }, { "X": 0.6035550832748413, "Y": 0.9628543853759766 }, { "X": 0.6035580039024353, "Y": 0.9704312086105347 }, { "X": 0.5365484356880188, "Y": 0.970402717590332 } ] }

The inferences files are requested and retrieved from the UI, and the key-value pairs tab shows 10 key-value pairs found, but the Raw Text section says "No Raw Text detected"

knihit · 2025-01-14T21:40:45Z

Thank you for the additional detail. Investigating the issue.

nekogeko · 2025-01-15T19:41:53Z

It appears that the issue may be in the javascript code responsible of retrieving the number of pages from the back-end response in document.js

OmarRad · 2025-01-15T20:03:19Z

@nekogeko thank you for reaching out. You're right, getDocumentPageCount() is returning undefined in document.ts. This is due to a mistake in the reducer logic in inferenceApiSlice.ts causing the textractDetectResponse object to look like { data: detectTextResponse } instead of simply being detectTextResponse as was expected by the rest of the code.

While we work on releasing the fix for this, we will share it with you here so that you can add it to your code in the meantime.

In inferenceApiSlice.ts in the following lines 25-28:

                if (validInferences.includes(InferenceName.TEXTRACT_DETECT_TEXT)) {
                    unformattedtextractDetectResponse = await baseQuery(
                        `${INFERENCES_PATH}/${arg.selectedCaseId}/${arg.selectedDocumentId}/${InferenceName.TEXTRACT_DETECT_TEXT}`
                    );
                }

you'll need to make these changes:

Line 26

- unformattedtextractDetectResponse = await baseQuery(
+ const response = await baseQuery(

and after line 28 add

+ unformattedtextractDetectResponse = response.data as any;

I've attached a screenshot of what this change should look like

nekogeko · 2025-01-15T20:40:50Z

thanks, I will test this and get back to you

nekogeko · 2025-01-17T20:20:54Z

I confirm that the issue is resolved

nekogeko added the bug Something isn't working label Jan 13, 2025

nekogeko closed this as completed Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No inferences found by Textract #87

No inferences found by Textract #87

nekogeko commented Jan 13, 2025

knihit commented Jan 13, 2025

nekogeko commented Jan 14, 2025

knihit commented Jan 14, 2025

nekogeko commented Jan 15, 2025 •

edited

Loading

OmarRad commented Jan 15, 2025 •

edited

Loading

nekogeko commented Jan 15, 2025

nekogeko commented Jan 17, 2025

No inferences found by Textract #87

No inferences found by Textract #87

Comments

nekogeko commented Jan 13, 2025

knihit commented Jan 13, 2025

nekogeko commented Jan 14, 2025

knihit commented Jan 14, 2025

nekogeko commented Jan 15, 2025 • edited Loading

OmarRad commented Jan 15, 2025 • edited Loading

nekogeko commented Jan 15, 2025

nekogeko commented Jan 17, 2025

nekogeko commented Jan 15, 2025 •

edited

Loading

OmarRad commented Jan 15, 2025 •

edited

Loading