DocumentOperations
The DocumentOperations class provides various operations for uploading, retrieving, analyzing, and managing documents within the Weav.ai platform. This class handles authentication, validation, and error responses for ease of use.
Prerequisite - To get started, ensure your python environment has the weav.ai developer library correctly installed Installation Guide.
Initialization
To initialize the DocumentOperations class, you need a Config object containing the authentication details.
from weavaidev import Config
from weavaidev.documents import DocumentOperations
config = Config(auth_token="eyJhbGci....",env="")
doc_ops = DocumentOperations(config=config)
- Parameters:
config: AConfigobject containing the base configuration and authentication token.
Method Summary
- create_document: Uploads a document and creates a new record.
- get_page: Fetches details of a specific page in a document.
- get_page_text_and_words: Retrieves text and word-level data for a page.
- get_page_level_status: Retrieves the page-level status of a document.
- get_document_summary_status: Retrieves summary and redacted summary status of a document.
- get_document: Fetches full details of a document.
- get_document_hierarchy: Retrieves hierarchical structure of a document.
- download_form_instance: Downloads a form instance in JSON or CSV format.
- get_document_categories: Fetches available document categories.
- get_document_tags: Retrieves all available document tags.
- trigger_document_summary: Initiates generation of a document summary.
Method Details
1. create_document
Uploads a document to create a new record.
- Parameters:
file_path(str): Path to the document file.folder_id(Optional[str]): Folder ID for the document. Defaults to an empty string.
- Returns:
CreateDocumentResponse: Contains details about the created document.
- Raises:
FileNotFoundErrorif the file does not exist.DocumentProcessingExceptionfor any error during document creation.
-
Usage:
response = doc_ops.create_document(file_path="/path/to/document.pdf", folder_id="folder123")
2. get_page
Fetches the status and details of a specific page within a document.
- Parameters:
document_id(str): ID of the document.page_number(int): Page number.bounding_boxes(Optional[bool]): Flag to include bounding boxes for text. Default isFalse.
- Returns:
GetPageStatusResponse: Details of the specified page.
- Raises:
DocumentProcessingExceptionfor authentication, validation, or page retrieval errors.
-
Usage:
response = doc_ops.get_page(document_id="doc123", page_number=1, bounding_boxes=True)
3. get_page_text_and_words
Retrieves text and word-level details for a page.
- Parameters:
document_id(str): ID of the document.page_number(int): Page number.
- Returns:
GetPageTextResponse: Contains text, words, and extracted entities for the page.
- Raises:
DocumentProcessingExceptionfor authentication, validation, or retrieval errors.
-
Usage:
response = doc_ops.get_page_text_and_words(document_id="doc123", page_number=1)
4. get_page_level_status
Retrieves page-level processing status for a document.
- Parameters:
document_id(str): ID of the document.
- Returns:
PageLevelStatusResponse: Contains OCR, classification, and entity extraction statuses.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_page_level_status(document_id="doc123")
5. get_document_summary_status
Retrieves summary and redacted summary status for a document.
- Parameters:
document_id(str): ID of the document.
- Returns:
DocumentSummaryResponse: Summary and redacted summary status.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_document_summary_status(document_id="doc123")
6. get_document
Fetches full details of a document.
- Parameters:
document_id(str): ID of the document.fill_pages(Optional[bool]): Flag to include detailed page data. Default isFalse.
- Returns:
CreateDocumentResponse: Document details, including pages and metadata.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_document(document_id="doc123", fill_pages=True)
7. get_document_hierarchy
Fetches the hierarchical structure of a document.
- Parameters:
document_id(str): ID of the document.
- Returns:
DocumentHierarchyResponse: Contains the structure of the document.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_document_hierarchy(document_id="doc123")
8. download_form_instance
Downloads a form instance from a document in the specified format.
- Parameters:
document_id(str): ID of the document.download_format(Literal[“JSON”, “CSV”]): Format for download. Default isJSON.
- Returns:
Dict[str, Any]if JSON, orpd.DataFrameif CSV.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.download_form_instance(document_id="doc123", download_format="CSV")
9. get_document_categories
Fetches all available document categories.
- Parameters:
- None.
- Returns:
DocumentCategoriesResponse: List of document categories.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_document_categories()
10. get_document_tags
Retrieves all available tags for documents.
- Parameters:
- None.
- Returns:
DocumentTagResponse: List of tags for documents.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.get_document_tags()
11. trigger_document_summary
Triggers the generation of a summary for a document.
- Parameters:
document_id(str): ID of the document.
- Returns:
DocumentSummaryResponse: Summary and redacted summary of the document.
- Raises:
DocumentProcessingExceptionfor authentication or retrieval errors.
-
Usage:
response = doc_ops.trigger_document_summary(document_id="doc123")
Exception Handling
All methods in DocumentOperations may raise a DocumentProcessingException if:
- Authentication fails (status code 401).
- Validation issues occur (status code 422).
- The requested document or resource is not found (status code 404).
- Any general API error arises during the request.