Data Ingestion Flow

Antei ingests structured data from integrations and CSV uploads to power compliance workflows such as reconciliation, return filing, tax calculation, and risk analysis. Every ingestion event is securely scoped, logged, and transformed into normalized internal formats.


Sources of Data

We ingest data from the following sources:

  • Third-party Integrations
    Platforms such as Stripe, Razorpay, Chargebee, QuickBooks, Xero, BambooHR, Gmail, Outlook, and Slack
  • CSV Uploads
    Users can manually upload structured data for entities including:
    • Transactions
    • Invoices
    • Products
    • Contacts (Customers, Vendors, Authorities)

Supported Ingestion Modes

MethodDescriptionUse Cases
OAuth IntegrationsSecure pull via scoped tokensContinuous sync with billing and HRIS systems
CSV UploadsFile-based manual ingestion via UIOne-time data loads or offline correction flows
WebhooksReal-time, event-driven ingestionStripe payments, refunds, Slack messages

Ingestion Pipeline

1

1. Authorization or File Upload

  • OAuth 2.0 authorization is completed for integrated platforms
  • CSV uploads are securely uploaded through the Antei UI
  • Each request is scoped to a specific organization
2

2. Sync or Upload Trigger

  • Triggers can be webhook-based, time-based (cron), or manual
  • CSV uploads are explicitly initiated by users
  • Metadata such as source, sync_time, and trigger_type are recorded
3

3. Parsing & Normalization

  • Incoming data is parsed and transformed into Antei’s internal registry structure
  • Supported entities include transaction, invoice, contact, product, and transaction_op
  • Payloads are matched against the internal registry using a config-driven mapping layer
4

4. Breakdown & Componentization

  • Each structured record is broken down into its atomic components
  • Line items, taxes, and references are extracted and linked to core entities
  • Nested data is flattened and aligned with the registry schema
5

5. Deduplication & Matching

  • For each entity, we perform weighted matching using a hybrid method:
    • Fuzzy scoring for names, emails, references
    • Structured scoring using contact type, jurisdiction, currency, etc.
  • Results are evaluated:
    • Score ≥ 90%: Entity is marked as unprocessed for manual review and override
    • Score < 90%: A new object is created with generated IDs
  • This applies both at component level (e.g., product, contact) and at transaction level
6

6. Temporary Storage

  • Parsed and mapped data is stored in secure staging tables
  • Metadata includes sync source, change status, timestamps, and classification
  • These records are held pending validation
7

7. Validation & Classification

  • Validations ensure schema compliance and reference linkage
  • Missing or mismatched data is flagged for user resolution
  • Auto-tagging applies classification such as taxability, reverse charge, etc.
8

8. Push to Org Database

  • Fully validated records are saved to the organization’s production tables
  • Available for reconciliation, filings, reporting, and audits
  • All actions are traceable via logs and metadata

Sync Frequencies

FrequencyTrigger TypeExamples
Real-timeWebhooksStripe charges, Slack alerts
HourlyBackground CronXero, Chargebee, QuickBooks
Manual UploadUser-initiatedInvoices, legacy transactions
DailyScheduled PullEmployee and product sync

Observability & Logs

  • Every sync or upload is logged with timestamp, method, and source
  • All records in staging and unprocessed buckets include error context
  • Retry queues automatically process temporary failures
  • Logs are viewable in Org Settings → Sync Logs

Access Control

  • All ingestion is scoped per organization
  • API integrations use short-lived tokens with limited scope
  • CSV files are processed in memory and discarded post-validation
  • Field-level data sensitivity is respected and retained in metadata

Need Help?

For questions on ingestion architecture, field mapping, or preparing import files:
support@antei.com