Data Ingestion Flow
How Antei collects and structures data from third-party integrations and file uploads to power tax workflows.
Data Ingestion Flow
Antei ingests structured data from integrations and CSV uploads to power tax compliance workflows including reconciliation, filings, taxability analysis, and risk monitoring. Every ingestion event is scoped per organization, logged, and transformed into normalized formats via our internal registry and mapping logic.
Sources of Data
We support data ingestion from the following sources:
-
Third-party Integrations
Examples include Stripe, Razorpay, Chargebee, QuickBooks, Xero, BambooHR, Gmail, Outlook, and Slack -
CSV Uploads
Users can upload structured data manually, including:- Transactions
- Invoices
- Products
- Contacts (Customers, Vendors, Authorities)
Supported Ingestion Modes
Method | Description | Use Cases |
---|---|---|
OAuth Integrations | API-based ingestion via scoped tokens | Ongoing syncs from billing, HRIS, communication |
CSV Uploads | File-based UI ingestion | Bulk data uploads, one-time imports |
Webhooks | Real-time, event-based triggers | Stripe payments, refunds, Slack messages |
Ingestion Pipeline
1. Authorization or File Upload
- OAuth 2.0 Authorization for integrated platforms
- CSV Upload via secure UI workflow
- All inputs scoped by organization and source
2. Sync or Upload Trigger
- Triggered by webhooks, scheduled cron jobs, or manual syncs
- Metadata recorded:
source
,sync_time
,trigger_type
,auth_token
3. Parsing & Normalization
- API payloads and CSV rows parsed to match internal schema
- Supported entities:
transaction
,invoice
,contact
,product
,transaction_op
- Mapping handled via integration-specific configs referencing the Antei registry
4. Breakdown & Componentization
- Records decomposed into normalized atomic components
- Nested fields (e.g. line items, tax summaries) flattened
- Data enriched with inferred fields such as jurisdiction or currency
5. Deduplication & Matching
- Antei applies weighted entity matching combining:
- Fuzzy scoring for names, emails, addresses
- Structured scoring for tax ID, currency, contact type, jurisdiction
- Match outcome:
- Score ≥ 90%: Marked as
unprocessed
for user override - Score < 90%: Created as new entity
- Score ≥ 90%: Marked as
- This process runs at:
- Component level (e.g., contact, product)
6. Temporary Storage
- Parsed records stored in region-aware staging tables
- Includes sync metadata, classification tags, source, and extracted ID
- No data pushed to primary tables without validation
7. Validation & Classification
- Schema-level checks: required fields, correct formats, conditional rules
- Entity linkage checks: references must exist or be resolvable
- Classification tags (e.g.,
financial
,sensitive
,jurisdictional
) applied automatically
8. Revised Deduplication & Matching
- Antei applies weighted entity matching combining:
- Fuzzy scoring for names, emails, addresses
- Structured scoring for tax ID, currency, contact type, jurisdiction
- Match outcome:
- Score ≥ 90%: Marked as
unprocessed
for user override - Score < 90%: Created as new entity
- Score ≥ 90%: Marked as
- This process runs at:
- Transaction level (to dedupe invoices and refunds)
9. Push to Org Database
- Validated, deduplicated records inserted into organization-scoped production tables
- Records become available for reconciliation, tax logic, invoicing, and filing
- Logs persist linkage to original ingestion source and timestamp
Sync Frequencies
Frequency | Trigger Type | Examples |
---|---|---|
Real-time | Webhooks | Stripe charges, refunds |
Hourly | Cron jobs | QuickBooks, Xero, Chargebee |
Daily | Scheduled pull | BambooHR, email inboxes |
Manual Upload | User-triggered | Offline data, corrections, imports |
Observability & Logs
- Each ingestion event is logged with metadata and trigger source
- Staging and unprocessed records include reason codes
- Sync logs accessible under Org Settings → Sync Logs
- Retry queue auto-handles transient failures and slow APIs
Access Control
- All ingestion is scoped by organization and environment
- OAuth tokens and upload sessions are validated per request
- CSV files are processed in-memory and discarded post-validation
- Region-aware routing ensures PII is stored in appropriate jurisdictions
- EU data → EU
- India data → IND
- US/Rest of World → US
Need Help?
For help with ingestion structure, mapping schemas, or staging errors:
support@antei.com