Skip to main content

Medusa

Medusa is the project that manages content intake automations for multiple tenants. It operates as a serverless application deployed on AWS Lambda using Laravel Vapor.

Overview

Medusa serves as the central content intake automation engine for the Publica.la platform. The system has two main responsibilities and distinct ways of working:

1. Publishing Automations

A simple setup system for customers who want to automate content publishing by placing PDF files in FTP servers or S3 buckets. This system is 100% reliant on Farfalla's public APIs, demonstrating Publica.la's commitment to dogfooding their own public API infrastructure. Comprehensive logging ensures full traceability of all publishing operations. Features include:

  • Scheduled Content Monitoring: Cron-based scheduling to monitor buckets for new content
  • Content Preprocessing: Automated PDF file processing and appending when multiple files need to be combined
  • Public API Integration: Uses exclusively Farfalla's public APIs for all publishing operations
  • Complete Traceability: Comprehensive logging of all operations for audit trails and debugging
  • Early Access Feature: Allows newspapers to publish tomorrow's content in advance
  • Smart Zoom Articles Engine: Creates HTML native responsive experiences for sections or individual articles using XML files from SFTP/S3 buckets

2. ONIX Intake System

A highly scalable and robust system that integrates with content partners using the ONIX distribution standard. Medusa acts as a data translator, converting the industry-standard ONIX data model into Publica.la's internal data model that Farfalla understands and can process. This system currently uses some internal-only Farfalla endpoints that may be opened up or absorbed into the public API in the future. Scalability and comprehensive traceability are core design principles, with detailed logging for all ONIX processing operations. Features include:

  • Content Partner Integration: Connects with SFTP servers or S3 buckets from partners like BookWire, Ingram, and other ONIX distributors
  • Continuous Monitoring: Real-time monitoring of content sources for updates and new publications
  • Data Model Translation: Converts ONIX standard data structures to Publica.la's internal format for Farfalla processing
  • Enterprise Scalability: Designed to handle high-volume content processing from multiple partners simultaneously
  • Complete Traceability: Detailed logging of all ONIX processing operations for full audit trails and debugging
  • Internal API Usage: Leverages specialized internal-only endpoints for complex ONIX processing operations
  • Multi-Tenant Publishing: Can publish to specific Farfalla tenants, directly to marketplace, or multiple tenants simultaneously
  • Advanced Pricing: Processes discounted prices, geographical availability restrictions, and future pricing schedules
  • ONIX Standard Compliance: Solid support for ONIX distribution standard requirements

Architecture

Publishing Automations Components

  • Automation Scheduler: Checks every minute for due automations with cron-like expressions and timezone handling
  • Content Fetcher: Retrieves raw content from FTP servers or S3 buckets with isolated credentials
  • PDF Processing Engine: Handles PDF file preprocessing, including merging multiple PDFs when needed
  • Smart Zoom Engine: Generates HTML native responsive articles from XML files for enhanced user experience
  • Early Access System: Manages scheduled publication of future content (e.g., tomorrow's newspaper)
  • Public API Integration: Uses exclusively Farfalla's public APIs for content publishing (dogfooding)

ONIX Intake System Components

  • ONIX File Monitor: Continuously monitors SFTP servers and S3 buckets for ONIX standard files
  • Content Partner Integration: Manages connections with partners like BookWire, and other ONIX distributors
  • ONIX Processor: Parses and processes ONIX files according to industry standards
  • Multi-Tenant Publisher: Handles publishing to specific tenants, marketplace, or multiple destinations
  • Pricing Engine: Processes complex pricing scenarios including discounts, geographic restrictions, and future pricing
  • Distribution Manager: Orchestrates content distribution across multiple channels and tenants
  • Internal API Integration: Utilizes specialized internal-only Farfalla endpoints for complex ONIX operations

Shared Infrastructure

  • Advanced Execution Monitor: Comprehensive logging, notifications, and run history tracking for both systems with full traceability
  • Scalable Architecture: Serverless deployment on AWS Lambda enables automatic scaling for high-volume operations
  • Tenant Management: Isolated configurations and credentials per tenant
  • Error Handling: Robust error tracking and notification systems with detailed audit trails

Technology Stack

  • Backend: Laravel 10 (PHP 8.3)
  • Database: SingleStore (distributed SQL database)
  • Admin Interface: Laravel Nova
  • Deployment: AWS Lambda via Laravel Vapor
  • Storage: AWS S3 with multi-tenant configurations
  • Queue System: Laravel queues for background processing
  • Frontend Assets: Laravel Mix with Vue.js components
  • Monitoring: Sentry for error tracking, email and Slack notifications

Key Models

The system is built around several core Laravel models:

  • Automation: Core workflow definitions with scheduling and processing rules
  • Run: Execution records for each automation instance with status tracking
  • Configuration: Schedule and processing configurations per tenant
  • ContentIntake: ONIX file discovery and processing workflows
  • OnixFile/OnixEvent: Publishing industry metadata handling and event processing

Deployment Strategy

Serverless Architecture

  • AWS Lambda: Deployed via Laravel Vapor for cost efficiency and automatic scaling
  • Environment Separation: Production and staging environments with isolated configurations
  • Optimized Build Process: Cached configurations and optimized dependencies for fast cold starts
  • Network Configuration: Custom VPC setup for secure database access and tenant isolation

Performance Optimization

  • Serverless deployment reduces operational costs during low-usage periods
  • Automatic scaling handles traffic spikes during publication releases
  • Multi-region deployment capabilities for global content distribution
  • Optimized Lambda function sizing for processing workloads

Business Context

Medusa is specifically designed for the publishing industry, addressing critical needs in:

  • Digital Publication Workflows: Streamlining the transition from content creation to publication
  • Multi-Tenant Publishing Platforms: Supporting multiple publishers with isolated, secure environments
  • Automated Content Processing: Reducing manual work in content preparation and distribution
  • Publishing Industry Standards: Native support for ONIX metadata and industry best practices
  • Scalable Content Distribution: Handling varying publication schedules and content volumes

The automation-driven approach allows publishing companies to focus on content creation while the platform handles the technical complexities of processing, formatting, and distribution.

Integration Capabilities

Medusa integrates with multiple systems and services:

Platform Integration

  • Farfalla: Main platform backend for content delivery and tenant store management
  • Castoro: PDF processing service for content transformation and optimization
  • AWS S3: Multi-tenant storage with isolated bucket configurations
  • SingleStore: Distributed database for automation state and run history

External Integrations

  • Content Sources: S3 buckets, external content feeds, and file systems
  • Tenant Systems: Individual tenant stores and content management platforms
  • Publishing Standards: ONIX file processing for industry metadata compliance
  • Notification Services: Email and Slack integration for automation alerts

How It Works

Medusa operates two distinct content intake automation workflows:

Publishing Automations Workflow

  1. Content Monitoring: System monitors designated FTP servers or S3 buckets on scheduled intervals using cron-like expressions
  2. Content Detection: When new PDF files are detected, the automation is triggered
  3. Content Processing:
    • Multiple PDF files are merged if needed
    • XML article files are processed for smart zoom functionality
    • Early access content is scheduled for future publication
  4. Publication: Processed content is automatically published to the designated Farfalla tenant store
  5. Monitoring: Execution status is tracked with comprehensive logging and notifications

ONIX Intake System Workflow

  1. Continuous Monitoring: System continuously monitors SFTP servers and S3 buckets from content partners for ONIX files
  2. ONIX Processing & Translation: When new or updated ONIX files are detected:
    • Files are parsed according to ONIX industry standards
    • Metadata, pricing, and availability information is extracted
    • Data is translated from ONIX format to Publica.la's internal data model for Farfalla
    • Geographic restrictions and future pricing are processed
  3. Multi-Channel Distribution: Content is published to:
    • Specific Farfalla tenants based on configuration
    • Marketplace platforms directly
    • Multiple destinations simultaneously as needed
  4. Synchronization: Content updates are automatically synchronized across all distribution channels
  5. Monitoring: Comprehensive tracking of all processing activities with detailed logging

Shared Monitoring & Error Handling

Throughout both workflows, Medusa provides:

  • Real-time execution status tracking
  • Detailed logging for debugging and analytics
  • Email and Slack notifications for success/failure events
  • Integration with Sentry for error monitoring
  • Tenant-specific configuration and credential management

Development & Operations

Development Tools

  • CI/CD: GitLab CI pipeline for automated deployments
  • Code Quality: PHPStan static analysis and Laravel Pint formatting
  • Testing: Pest testing framework for comprehensive test coverage
  • Documentation: In-code documentation and configuration management

Monitoring & Maintenance

  • Comprehensive Logging: Detailed execution logs for debugging and analysis
  • Error Tracking: Sentry integration for real-time error monitoring
  • Performance Monitoring: Execution metrics and system performance tracking
  • Notifications: Real-time alerts via email and Slack for critical events

Production Features

  • Enterprise-Grade: Production-ready automation platform with advanced Laravel patterns
  • Serverless Architecture: Cost-effective and scalable deployment strategy
  • Multi-Tenant Security: Isolated tenant environments with secure credential management
  • Publishing Industry Expertise: Built specifically for digital publishing workflows

X

Graph View