Workflows

Learn how to manage, run, and monitor workflows on the Weav.ai platform. This guide provides insights into creating, executing, and troubleshooting workflows for document processing.

Overview

The Workflows section allows users to automate document processing tasks on the Weav.ai platform. Workflows define a sequence of steps to extract, process, and analyze document data. This guide details operations such as retrieving workflow lists, running workflows, monitoring their statuses, and re-running or skipping steps for enhanced flexibility.

Prerequisite - To get started, ensure your environment is properly configured by following the Setup Guide.

Get all workflows

Get a list of all workflows that are present on the Weav.ai platform.

 python3 workflows/get_all_workflows.py --show_internal_steps false

Parameters:

Parameter	Description	Required/Optional	Allowed values
show_internal_steps	Set to true to show detailed internal steps of the workflowx	Optional (default : False)	false, f, False, true, t, True

Response:

{
   "workflows":[
      {
         "name":"dagtasktest",
         "params":[
            
         ],
         "tasks":[
            {
               "downstream_tasks":[
                  
               ],
               "is_active":true,
               "name":"only_if_failed"
            },
            {
               "downstream_tasks":[
                  
               ],
               "is_active":true,
               "name":"only_if_success"
            }
         ]
      }
   ]
}

Get Single workflow

Get all the steps that are present inside a particular workflow.

python3 workflows/get_single_workflow.py --workflow_name dagtasktest --show_internal_steps false

Parameters:

Parameter	Description	Required/Optional	Allowed values
show_internal_steps	Set to true to show detailed internal steps	Optional (default : False)	false, f, False, true, t, True
workflow_name	The name of the workflow to be fetched	Required	String

Response:

{
   "name":"dagtasktest",
   "params":[
      
   ],
   "tasks":[
      {
         "downstream_tasks":[
            
         ],
         "is_active":true,
         "name":"only_if_failed"
      },
      {
         "downstream_tasks":[
            
         ],
         "is_active":true,
         "name":"only_if_success"
      },
      {
         "downstream_tasks":[
            "second"
         ],
         "is_active":true,
         "name":"first"
      },
      {
         "downstream_tasks":[
            "third"
         ],
         "is_active":true,
         "name":"second"
      },
      {
         "downstream_tasks":[
            "only_if_success",
            "only_if_failed"
         ],
         "is_active":true,
         "name":"third"
      },
      {
         "downstream_tasks":[
            "second",
            "third",
            "first"
         ],
         "is_active":true,
         "name":"split"
      }
   ]
}

Run Workflow

Run a particular workflow for a document. The name of the workflow can be obtained from name field in the Get all workflows API response demonstrated above.

python3 workflows/run_workflow.py --doc_id 66e0fba3089fbd21c4dd80c3 --workflow_name dagtest --data "{\"form_id\":\"66fe5c58b1d0dfb13c9975f3\"}"

Parameters:

Parameter	Description	Required/Optional	Allowed values
workflow_name	Name of the workflow to be run	Required
doc_id	Document for which the workflow has to be run	Required
data	Any extra parameters that are required by the worflow	Optional	Stringified JSON

Response:

{
   "created_at":"2024-09-25T19:33:32.000000+00:00",
   "document_id":"66e0fba3089fbd21c4dd80c3",
   "document_name":"AAPL_10Q.pdf",
   "end_date":"None",
   "in_folders":[
      "66e0f93093798ee1c937e39a"
   ],
   "run_id":"66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6",
   "start_date":"None",
   "state":"None",
   "workflow_id":"dagtest"
}

Get Workflow status

Upon running a workflow, check the status of the run.

python3 workflows/get_workflow_status.py --workflow_id "dagtasktest" --workflow_run_id "66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6"

Parameters:

Parameter	Description	Required/Optional	Allowed values
show_internal_steps	Set to true to show detailed internal steps	Optional (default : False)	false, f, False, true, t, True
workflow_id	Document identifier for which the workflow has to be run	Required
workflow_run_id	The run ID of the workflow	Required

Response:

{
    "document_id": "66fe1752927ce8c0ebda42b9",
    "end_date": datetime.datetime(2024, 10, 3, 4, 23, 42, 58430, tzinfo=TzInfo(UTC)),
    "start_date": datetime.datetime(2024, 10, 3, 4, 14, 45, 30553, tzinfo=TzInfo(UTC)),
    "status": "failed",
    "tasks": [
        {
            "end_date": datetime.datetime(
                2024, 10, 3, 4, 20, 4, 293969, tzinfo=TzInfo(UTC)
            ),
            "failed_task_ids": [-1],
            "name": "set_processing_to_in_state__1",
            "start_date": datetime.datetime(
                2024, 10, 3, 4, 20, 4, 293969, tzinfo=TzInfo(UTC)
            ),
            "status": "failed",
            "task_status_summary": {
                "failed": 1,
                "queued": 0,
                "running": 0,
                "skipped": 0,
                "success": 0,
            },
        },
       .
       .
       .
    ],
}

Rerun workflow

If a workflow has already run before, re-run the workflow as per requirements

python3 workflows/rerun_workflow.py --doc_id 66df87ec2b1edfc0dc3b556f --workflow_name "dagtest"

Parameters:

Parameter	Description	Required/Optional	Allowed values
workflow_name	Name of the workflow to be run	Required
doc_id	Document for which the workflow has to be run	Required
data	Any extra parameters that are required by the worflow	Optional	Stringified JSON

Response:

{
   "created_at":"2024-09-25T19:33:32.000000+00:00",
   "document_id":"66df87ec2b1edfc0dc3b556f",
   "document_name":"AAPL_10Q.pdf",
   "end_date":"None",
   "in_folders":[
      "66e0f93093798ee1c937e39a"
   ],
   "run_id":"66e0fba3089fbd21c4dd80c3_3df1b127-9ea5-4714-9bf5-b1a5653859f6",
   "start_date":"None",
   "state":"None",
   "workflow_id":"dagtest"
}

Skip steps in a workflow

Allow the workflow to skip any steps that are present in the workflow.

python3 workflows/skip_tasks.py --workflow_name "dagtest" --tasks task_1 task_2

Parameters:

Parameter	Description	Required/Optional	Allowed values
workflow_name	Name of the workflow to be run	Required
tasks	A list of tasks to be skipped	Required	Strings seperated by spaces (Example —tasks task_1 task_2

Get Workflow Runs For Document

Fetches all the workflows runs for a particular document.

python3 workflows/get_workflows_for_document.py --doc_id 66df87ec2b1edfc0dc3b556f --state "success" --query "ANNUAL REPORT" --skip 0 --limit 1

Parameters:

Parameter	Description	Required/Optional
doc_id	Document for which the workflow has to be fetched	Required
state	State of workflow	Optional
query	This string is matched with workflow and document name	Optional
skip	Number of workflows to skip	Optional
limit	Max fetch size	Optional

Response:

{
   "docs":[
      {
         "created_at":"None",
         "document_id":"66fe1752927ce8c0ebda42b9",
         "document_name":"66fe1752927ce8c0ebda42b9",
         "end_date":"2024-10-03T20:37:32.405064+00:00",
         "in_folders":[
            
         ],
         "run_id":"66fe1752927ce8c0ebda42b9_078c1e26-a82b-4ada-ac5e-357eac3eb2b3",
         "start_date":"2024-10-03T20:37:26.234912+00:00",
         "state":"failed",
         "workflow_id":"process_form_workflow"
      },
      .
      .
      .
      .
      {
         "created_at":"None",
         "document_id":"66fe1752927ce8c0ebda42b9",
         "document_name":"MCS-CS-Handbook-2022-2023Publish.pdf",
         "end_date":"2024-10-03T04:26:21.715502+00:00",
         "in_folders":[
            
         ],
         "run_id":"66fe1752927ce8c0ebda42b9_2451533e-bbec-4e1a-acd5-0ab686f6d430",
         "start_date":"2024-10-03T04:14:47.246008+00:00",
         "state":"failed",
         "workflow_id":"process_form_workflow"
      }
   ],
   "total":7
}