Send to AI

Processing Flow

End-to-end flow of a content import, from file upload to completion.

1. File Upload

The browser uploads content files directly to S3 using presigned PUT URLs.

User selects files (PDF, EPUB, MP3, JPG, JPEG, PNG) in the drag-and-drop zone.
For each file, Livewire generates a presigned PUT URL via ImportS3Service and creates an ImportFile record with status uploading.
The browser uploads files in parallel (concurrency controlled by import.upload_concurrency, default 5).
On completion, the browser calls fileUploadCompleted() which sets the ImportFile status to completed.

The Import record is created in draft status when the first presigned URL is requested.

Limits: Content files max 500 MB each. Allowed extensions: .pdf, .epub, .mp3, .jpg, .jpeg, .png.

2. Metadata Upload

The metadata upload has two entry paths:

Drag & drop (auto-detection):

User drags a spreadsheet file (XLSX, XLS, or CSV, max 10 MB) onto the drop zone.
SpreadsheetParser streams the file using Spatie SimpleExcel (memory stays under ~3 MB regardless of file size).
Headers are normalized to lowercase and compared against known formats using a 60% header match threshold.
If a format is detected, headers are validated against it and a 5-row preview is shown.
If no format reaches the threshold, a modal prompts the user to select the format manually. Headers are then re-validated against the selected format.

Browse file (manual selection):

User clicks "Browse file" → a format selection modal appears before the file picker opens.
User selects the format (Publica.la or VitalSource).
The native file picker opens, user selects the spreadsheet file.
The file is parsed with the pre-selected format — auto-detection is skipped.
Headers are validated against the chosen format and a 5-row preview is shown.

3. Validation

ImportValidator checks each parsed row:

Check	Rule
Name	Required — checks `name` (Publica.la) or `title` (VitalSource)
File type	Must be one of: `pdf`, `epub`, `audio`, `physical` (after normalization via `FileTypeNormalizer`)
Duplicate	If the row has an `external_id` (ISBN or VBK ID) that already exists in `import_products` for the same `content_intake_id` and team, it's flagged as already imported
File URL	If an external URL (passes `FILTER_VALIDATE_URL`), it's accepted as-is. Otherwise, the basename is matched case-insensitively against uploaded `ImportFile.original_name`. Physical products skip file matching entirely.

Results are split into valid and errors arrays. If there are errors, the user sees the first 50 and can choose "Continue with valid rows" to proceed without the invalid rows.

File Matching Logic

The system uses three strategies to resolve the file url column:

External URL — Value passes FILTER_VALIDATE_URL → used directly as the file URL in the API call.
Local file match — The basename is extracted via PHP's basename() (everything after the last /), then compared case-insensitively (strcasecmp) against ImportFile.original_name. This means the value can contain directory paths — only the filename portion is used for matching. A presigned GET URL is generated at processing time.
Physical product — File type is physical → file matching is skipped entirely.

Examples:

Spreadsheet Value	Strategy	Result
`https://cdn.example.com/book.pdf`	External URL	Passed through to API
`my-book.pdf`	Local match	Matched to uploaded `my-book.pdf` in S3
`My-Book.PDF`	Local match	Matched (case-insensitive)
`uploads/tenant/catalog/978-3-16-148410.epub`	Local match	Basename `978-3-16-148410.epub` matched
(empty, file type is physical)	Physical skip	No file URL needed

4. Record Creation

Valid rows are bulk-inserted into import_records in chunks of 500, each with status discovered.
The Import status changes to on-queue.
A ProcessImportBatch job is dispatched to the import-processing queue.

5. Batch Processing

ProcessImportBatch processes records in configurable batches (default 50 records per job).

Job Configuration

Queue: import-processing
Tries: 3
Timeout: 120 seconds
Middleware: WithoutOverlapping (lock key = import ID, expires after 5 min, non-blocking)

Processing Steps

Fetch next batch of discovered records (limit: batch_size).
Mark batch as on-queue with sent_at = now().
Build ImportContentData DTOs for each record, resolving file URLs (external passthrough or presigned GET URL from S3).
POST to {tenant_domain}/api/v3/content/bulk with { items: [...] }.
Handle the response (see below).
If more discovered records remain, dispatch a new ProcessImportBatch (self-chaining).
When no records remain, evaluate final import status.

Response Handling

Response	Action
429	Reset batch to `discovered`, re-dispatch with 30s delay
5xx	Cache-based retry tracking (max `retries`, default 3). If retries left: reset & re-dispatch with 30s delay. If exhausted: mark batch as `failed` with `server-error`
4xx	Mark batch as `failed` with `client-error`
200	Process per-record results: `success=true` → `done` + `ingested_at` + create `ImportProduct`; `success=false` → `failed` + `client-error` + error message

Final Status Evaluation

When no discovered or on-queue records remain:

All failed → Import status = failed
Any succeeded → Import status = done

Failed Job Recovery

If the job itself fails (exception), on-queue records are reset to discovered so they can be retried. The RescueStaleImports command runs every 3 minutes to re-dispatch jobs for imports stuck in on-queue for more than 5 minutes.

6. Progress Tracking

The UI polls every 3 seconds (wire:poll.3s="getProgress") during processing.
Progress shows processed count / total count with a percentage bar.
When 100% is reached, a brief delay displays before transitioning to the completed view.
The completed view shows final stats: total done, total failed.
Import history (ImportHistory component) polls every 5 seconds but only while there are in-progress imports.

Error Handling Summary

Error Source	Behavior	User Visibility
Validation error	Row excluded from import	Shown before import starts
429 rate limit	Transparent retry with 30s delay	None (progress stalls)
5xx server error	Up to 3 retries, then `server-error`	Record marked failed
4xx client error	Immediate `client-error` on batch	Record marked failed
Per-record failure	Individual record marked `failed` with API message	Visible in report/history
Job crash	Records reset to `discovered`, rescued by cron	Temporary stall

1. File Upload​

2. Metadata Upload​

3. Validation​

File Matching Logic​

4. Record Creation​

5. Batch Processing​

Job Configuration​

Processing Steps​

Response Handling​

Final Status Evaluation​

Failed Job Recovery​

6. Progress Tracking​

Error Handling Summary​

Graph View