Skip to content

Commit 43f8b59

Browse files
Merge pull request #1 from mx-awsdevteam/mx-awsdevteam-patch-1
async analyze doc documentation
2 parents 445e53e + 359dce8 commit 43f8b59

1 file changed

Lines changed: 69 additions & 1 deletion

File tree

content/en/docs/appstore/connectors/aws/aws-textract.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,14 @@ The domain model is a data model that describes the information in your applicat
174174
| `GroupProperty` | This entity holds information for showing the group that a certain key belongs to. The attribute it contains is `_id` which describes the group identification number which will be the same for each member of the group. Additionally, it contains a list of `ExpenseGroupPropertyType` objects. |
175175
| `ExpenseGroupPropertyType` | This entity holds information for distinguishing whether the expense group is a name or an address. The attribute it contains is `_Type`. |
176176
| `ExpenseDetection` | This generalization entity holds information for describing the detected expenses. The attribute it contains are `Text` and `Confidence`. The `Text` describes the word or line of text that is detected and the `Confidence` describes the percentual confidence in the text's detection. Additionally, it contains a specialized `Geometry` object (`AnalyzeExpenseGeometry`). |
177+
| `AbstractFeatureType` | This entity holds information about the type of analysis that should be executed. it contains the attribute `Value` which specifies the feature type value of type enumeration. |
178+
| `AbstractWarning` | This entity holds information about the warnings that have been sent along with a `GetDocumentAnalysisResponse` or a `GetExpenseAnalysisResponse`. It contains an ErrorCode attribute which specifies the error code of the warning. It has a list of `PageNumber` objects associated.|
179+
| `PageNumber` | This entity holds information about the PageNumber the associated `AbstractWarning` object refers to. |
180+
| `AbstractRequestQuery` | This entity holds information about the question Textract should apply to the document. The Text attribtue holds the question. It has a list of `PagesToSearch` objects associated. |
181+
| `PagesToSearch` | This entity holds information about the StartPage and EndPage the associated `Query` will be applied to. |
182+
| `AbstractDocumentAnalysisResponse` | This entity is the generalization of the response entities of the `AnalyzeDocument` and `GetDocumentAnalysis` actions. It contains the part of the responses that are shared between those actions. Most importantly, it has a list of type `DocumentAnalysisBlock` associated. A spcialization of this object should be used as an input parameter of the `AbstractDocumentAnalysisResponse_ProcessResults` microflow. This way, the responses from both the `AnalyzeDocument` and the `GetDocumentAnalysis` actions can be used by as input parameters.|
183+
| `DocumentAnalysisBlock` | This entity is a specialization of the `Block` entity and holds the Blocks returned by `AnalyzeDocument` and the `GetDocumentAnalysis` actions. |
184+
177185

178186
### 4.2 Enumerations {#enumerations}
179187

@@ -267,6 +275,29 @@ This enumeration indicates the selection status of the block. For more informati
267275
| `SELECTED` | SELECTED |
268276
| `NOT_SELECTED` | NOT_SELECTED |
269277

278+
#### 4.2.7 FeatureTypes
279+
280+
This enumeration holds the available types of analysis to perform, see
281+
282+
| Name | Caption |
283+
| --- | --- |
284+
| `FORMS` | FORMS |
285+
| `TABLES` | TABLES |
286+
| `QUERIES` | QUERIES |
287+
| `SIGNATURES` | SIGNATURES |
288+
| `LAYOUT` | LAYOUT |
289+
290+
#### 4.2.8 JobStatus
291+
292+
This enumeration indicates the status of the document analysis job as part of the `GetDocumentAnalysisResponse`, see
293+
294+
| Name | Caption |
295+
| --- | --- |
296+
| `IN_PROGRESS` | IN_PROGRESS |
297+
| `SUCCEEDED` | SUCCEEDED |
298+
| `FAILED` | FAILED |
299+
| `PARTIAL_SUCCESS` | PARTIAL_SUCCESS |
300+
270301
### 4.3 Activities {#activities}
271302

272303
Activities define the actions that are executed in a microflow or a nanoflow. For the Amazon Textract connector, they represent actions such as analyzing a document or expense. For more information, see [Activities](/refguide/activities/).
@@ -277,7 +308,7 @@ To help you work with multi-page PDF files in a synchronous way, you can use the
277308

278309
The `AnalyzeDocument` Amazon Textract action allows you to analyze documents and extract information from them. It requires a valid AWS region and `AnalyzeDocumentRequest` object. It additionally requires at least a `RequestQuery` object when the `GetQueries` attribute in `AnalyzeDocumentRequest` is set to true.
279310

280-
Additionally, you can use the `AnalyzeDocumentResponse_ProcessResults` sub-flow. This will process the response from Amazon Textract into the specialized `BlockItem` model.
311+
Additionally, you can use the `AbstractDocumentAnalysisResponse_ProcessResults` sub-flow. This will process the response from Amazon Textract into the specialized `BlockItem` model.
281312

282313
The input and output for this service are shown in the table below:
283314

@@ -316,3 +347,40 @@ This activity returns an `AnalyzeExpenseResponse` object with objects from the f
316347
| `LineItemField` | | This entity holds information for a line within the given document's table. |
317348
| `LineItemExpenseField` | `AmazonTextractConnector.ExpenseField` | This specialized entity holds information for the detected expense-related information, separated into categories `Type`, `LabelDetection` and `ValueDetection`. The attribute it contains is `PageNumber`, which describes the page number on which the value was detected. Additionally, it contains a list of `GroupProperty` objects, a specialized `ExpenseDetection` object (both `ExpenseDetectionLabel` and `ExpenseDetectionValue`), an `ExpenseType` object, and a `Currency` object. |
318349
| `AnalyzeExpenseBlock` | `AmazonTextractConnector.Block` | This entity holds information for items that are recognized in a document within a group of pixels close to each other. The attributes it contains are `BlockType`, `ColumnIndex`, `ColumnSpan`, `Confidence`, `EntityTypes`, `_Id`, `Page`, `RowIndex`, `RowSpan`, `SelectionStatus`, `Text` and `TextType`. The `BlockType` describes the type of text item that's recognized, the `ColumnIndex` describes the column in which a table appears the first column position is 1, the second column position is 2 and so on), the `ColumnSpan` describes the number of columns that a table cell spans, the `Confidence` describes the score that Amazon Textract has in the accuracy of the recognized text, the `EntityTypes` describes the type of entity, the `Page` describes the page on which a block was detected, the `RowIndex` describes the row in which a table cell is located (the first row position is 1, the second row position is 2, and so on), the `RowSpan` describes the number of rows that a table cell spans, the `SelectionStatus` describes the selection status of a selection element (such as an option, radio or checkbox), the `Text` describes the word or line of text that's recognized by Amazon Textract and `TextType` describes the kind of text that Amazon Textract has detected (handwritten or printed). Additionally, this entity contains a list of `Relationship` objects and a specialized Geometry object (`BlockGeometry`). |
350+
351+
#### 4.3.3 StartDocumentAnalysis {#startdocumentanalysis}
352+
353+
The `StartDocumentAnalysis` Amazon Textract action allows you to analyze multi-page documents asynchronously and extract information from them. It requires a valid AWS region, a `Credentials` object, a `StartDocumentAnalysisRequest` object, and a `S3DocumentLocation` object. It additionally requires at least one `AsynchronousFeatureType` object. If the Feature Type 'QUERIES' is part of the request, it additionally requires a `AsnychronousRequestQuery` object to specifiy the query.
354+
355+
The input and output for this service are shown in the table below:
356+
357+
| Input | Output |
358+
| --- | --- |
359+
| `StartDocumentAnalysisRequest` (Object) | `StartDocumentAnalysisResponse` (Object) |
360+
| `AWS_Region` (Enumeration) | |
361+
| `Credentials` (object) | |
362+
363+
This activity returns a `AnalyzeDocumentResponse` object with objects from the following entities, as shown in the table below:
364+
365+
| Name | Generalization | Documentation |
366+
| --- | --- | --- |
367+
| `StartDocumentAnalysisResponse` | | This entity is the response for the Amazon Textract `StartDocumentAnalyis` action. It contains a JobId attribute, which can be used by the `GetDocumentAnalysis` action to retrieve the results once they have been processed by the Textract service|
368+
369+
#### 4.3.4 GetDocumentAnalysis {#getdocumentanalysis}
370+
371+
The `GetDocumentAnalysis` Amazon Textract action allows you to retrieve the analysis results that have been invoked by the `StartDocumentAnalysis` action. It requires a valid AWS region, a `Credentials` object, and a `GetDocumentAnalysisRequest` object.
372+
373+
Additionally, you can use the `AbstractDocumentAnalysisResponse_ProcessResults` sub-flow. This will process the response from Amazon Textract into the specialized `BlockItem` model.
374+
375+
| Input | Output |
376+
| --- | --- |
377+
| `GetDocumentAnalysisRequest` (Object) | `GetDocumentAnalysisResponse` (Object) |
378+
| `AWS_Region` (Enumeration) | |
379+
| `Credentials` (object) | |
380+
381+
This activity returns a `GetDocumentAnalysisResponse` object with objects from the following entities, as shown in the table below:
382+
383+
| Name | Generalization | Documentation |
384+
| --- | --- | --- |
385+
| `GetDocumentAnalysisResponse` | `AbstractDocumentAnalysisResponse`| This entity is the response for the Amazon Textract `GetDocumentAnalyis` action. It holds information about the JobStatus. If too many Blocks were found, it contains a NextToken that can be used to retrieve the next batch of results. |
386+
| `GetDocumentAnalysisWarning` | This entity holds information about the warnings that were sent as part of the response, and the pages to which they apply.|

0 commit comments

Comments
 (0)