Workflows
Learn how to manage, run, and monitor workflows on the Weav.ai platform. This guide provides insights into creating, executing, and troubleshooting workflows for document processing.
Overview
The Workflows section allows users to automate document processing tasks on the Weav.ai platform. Workflows define a sequence of steps to extract, process, and analyze document data. This guide details operations such as retrieving workflow lists, running workflows, monitoring their statuses, and re-running or skipping steps for enhanced flexibility.
Prerequisite - To get started, ensure your environment is properly configured by following the Setup Guide.
Get all workflows
Get a list of all workflows that are present on the Weav.ai platform.
python3 workflows/get_all_workflows.py --show_internal_steps false
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
show_internal_steps | Set to true to show detailed internal steps of the workflowx | Optional (default : False) | false, f, False, true, t, True |
Response:
{
"workflows":[
{
"name":"dagtasktest",
"params":[
],
"tasks":[
{
"downstream_tasks":[
],
"is_active":true,
"name":"only_if_failed"
},
{
"downstream_tasks":[
],
"is_active":true,
"name":"only_if_success"
}
]
}
]
}
Get Single workflow
Get all the steps that are present inside a particular workflow.
python3 workflows/get_single_workflow.py --workflow_name dagtasktest --show_internal_steps false
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
show_internal_steps | Set to true to show detailed internal steps | Optional (default : False) | false, f, False, true, t, True |
workflow_name | The name of the workflow to be fetched | Required | String |
Response:
{
"name":"dagtasktest",
"params":[
],
"tasks":[
{
"downstream_tasks":[
],
"is_active":true,
"name":"only_if_failed"
},
{
"downstream_tasks":[
],
"is_active":true,
"name":"only_if_success"
},
{
"downstream_tasks":[
"second"
],
"is_active":true,
"name":"first"
},
{
"downstream_tasks":[
"third"
],
"is_active":true,
"name":"second"
},
{
"downstream_tasks":[
"only_if_success",
"only_if_failed"
],
"is_active":true,
"name":"third"
},
{
"downstream_tasks":[
"second",
"third",
"first"
],
"is_active":true,
"name":"split"
}
]
}
Run Workflow
Run a particular workflow for a document. The name of the workflow can be obtained from name
field in the Get all workflows
API response demonstrated above.
python3 workflows/run_workflow.py --doc_id 66e0fba3089fbd21c4dd80c3 --workflow_name dagtest --data "{\"form_id\":\"66fe5c58b1d0dfb13c9975f3\"}"
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
workflow_name | Name of the workflow to be run | Required | |
doc_id | Document for which the workflow has to be run | Required | |
data | Any extra parameters that are required by the worflow | Optional | Stringified JSON |
Response:
{
"created_at":"2024-09-25T19:33:32.000000+00:00",
"document_id":"66e0fba3089fbd21c4dd80c3",
"document_name":"AAPL_10Q.pdf",
"end_date":"None",
"in_folders":[
"66e0f93093798ee1c937e39a"
],
"run_id":"66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6",
"start_date":"None",
"state":"None",
"workflow_id":"dagtest"
}
Get Workflow status
Upon running a workflow, check the status of the run.
python3 workflows/get_workflow_status.py --workflow_id "dagtasktest" --workflow_run_id "66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6"
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
show_internal_steps | Set to true to show detailed internal steps | Optional (default : False) | false, f, False, true, t, True |
workflow_id | Document identifier for which the workflow has to be run | Required | |
workflow_run_id | The run ID of the workflow | Required |
Response:
{
"document_id": "66fe1752927ce8c0ebda42b9",
"end_date": datetime.datetime(2024, 10, 3, 4, 23, 42, 58430, tzinfo=TzInfo(UTC)),
"start_date": datetime.datetime(2024, 10, 3, 4, 14, 45, 30553, tzinfo=TzInfo(UTC)),
"status": "failed",
"tasks": [
{
"end_date": datetime.datetime(
2024, 10, 3, 4, 20, 4, 293969, tzinfo=TzInfo(UTC)
),
"failed_task_ids": [-1],
"name": "set_processing_to_in_state__1",
"start_date": datetime.datetime(
2024, 10, 3, 4, 20, 4, 293969, tzinfo=TzInfo(UTC)
),
"status": "failed",
"task_status_summary": {
"failed": 1,
"queued": 0,
"running": 0,
"skipped": 0,
"success": 0,
},
},
.
.
.
],
}
Rerun workflow
If a workflow has already run before, re-run the workflow as per requirements
python3 workflows/rerun_workflow.py --doc_id 66df87ec2b1edfc0dc3b556f --workflow_name "dagtest"
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
workflow_name | Name of the workflow to be run | Required | |
doc_id | Document for which the workflow has to be run | Required | |
data | Any extra parameters that are required by the worflow | Optional | Stringified JSON |
Response:
{
"created_at":"2024-09-25T19:33:32.000000+00:00",
"document_id":"66df87ec2b1edfc0dc3b556f",
"document_name":"AAPL_10Q.pdf",
"end_date":"None",
"in_folders":[
"66e0f93093798ee1c937e39a"
],
"run_id":"66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6",
"start_date":"None",
"state":"None",
"workflow_id":"dagtest"
}
Skip steps in a workflow
Allow the workflow to skip any steps that are present in the workflow.
python3 workflows/skip_tasks.py --workflow_name "dagtest" --tasks task_1 task_2
Parameters:
Parameter | Description | Required/Optional | Allowed values |
---|---|---|---|
workflow_name | Name of the workflow to be run | Required | |
tasks | A list of tasks to be skipped | Required | Strings seperated by spaces (Example —tasks task_1 task_2 |
Get Workflow Runs For Document
Fetches all the workflows runs for a particular document.
python3 workflows/get_workflows_for_document.py --doc_id 66df87ec2b1edfc0dc3b556f --state "success" --query "ANNUAL REPORT" --skip 0 --limit 1
Parameters:
Parameter | Description | Required/Optional |
---|---|---|
doc_id | Document for which the workflow has to be fetched | Required |
state | State of workflow | Optional |
query | This string is matched with workflow and document name | Optional |
skip | Number of workflows to skip | Optional |
limit | Max fetch size | Optional |
Response:
{
"docs":[
{
"created_at":"None",
"document_id":"66fe1752927ce8c0ebda42b9",
"document_name":"66fe1752927ce8c0ebda42b9",
"end_date":"2024-10-03T20:37:32.405064+00:00",
"in_folders":[
],
"run_id":"66fe1752927ce8c0ebda42b9_078c1e26-a82b-4ada-ac5e-357eac3eb2b3",
"start_date":"2024-10-03T20:37:26.234912+00:00",
"state":"failed",
"workflow_id":"process_form_workflow"
},
.
.
.
.
{
"created_at":"None",
"document_id":"66fe1752927ce8c0ebda42b9",
"document_name":"MCS-CS-Handbook-2022-2023Publish.pdf",
"end_date":"2024-10-03T04:26:21.715502+00:00",
"in_folders":[
],
"run_id":"66fe1752927ce8c0ebda42b9_2451533e-bbec-4e1a-acd5-0ab686f6d430",
"start_date":"2024-10-03T04:14:47.246008+00:00",
"state":"failed",
"workflow_id":"process_form_workflow"
}
],
"total":7
}