Preprocess Files for AI Agents

Preprocess Files for AI Agents

Docling provides a unified platform for document processing, enabling robust parsing across heterogeneous formats — including advanced PDF comprehension with layout and semantic analysis — while offering seamless integration with generative AI workflows and APIs.

Use the following command to install Docling packages.

pip install docling

Sample Python Code:

# This is a sample code to demonstrate how to use the DocumentConverter class
# from docling.document_converter import DocumentConverter
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
# The source can be a file path or a URL
source = "General/Advertisement-FAQ.pdf"
print( "Converting document..." )
result = converter.convert( source )
print( "Conversion complete!!!" )
# Print the document in markdown format
print(
    result.document.export_to_markdown()
)

Output:

Reference:
https://github.com/docling-project/docling

File:

I faced the following exception:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate

I ran the following command to resolve the Cert issue.

/Applications/Python\ 3.13/Install\ Certificates.command

Note:
Here 3.13 is my Python version.

Instead of getting markdown, I used export_to_text() method to get text content and created a text file. I used the text file for my AI Agents and it worked great.

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
source = "General/TRIM11.pdf"
print( "Converting document..." )
result = converter.convert( source )
print( "Conversion complete!!!" )

text_content = result.document.export_to_text()

with open( "output.txt", "w" ) as f:
    f.write( text_content )
    print( "Text file created successfully!" )

Leave a Reply