Skip to main content

Crawler Detection

Search and recommendation features automatically block crawlers to prevent bots from triggering expensive database queries.

Why Block Crawlers?

Storefront search and recommendations require complex queries:

  • Full-text search across issues, taxonomies, and metadata
  • Recommendation algorithms based on user behavior and content relationships

When search engine crawlers (Googlebot, Bingbot, etc.) index a storefront, they follow every link—including search forms and recommendation widgets. Without protection, a single crawl session could trigger thousands of expensive queries.

Affected Features

FeatureWhy Blocked
storefront_searchPrevents bots from executing full-text search queries
storefront_search_termsBlocks taxonomy term searches
storefront_recommended_issuesAvoids recommendation algorithm execution

How It Works

The system uses LaravelCrawlerDetect to identify bot User-Agents during feature resolution:

Request → Feature Check → Crawler Detection → Allow/Block
  1. Request comes in for a protected feature
  2. Feature resolver checks platform-level status first (global kill switch)
  3. If platform-enabled, checks for crawler via LaravelCrawlerDetect
  4. If crawler detected → feature returns disabled
  5. Otherwise → normal feature resolution continues

This happens in the feature access trait after the platform-level check but before tenant-specific resolution.

Bypass Mechanism

Some legitimate tools need to access these features while appearing as crawlers:

HeaderPurpose
x-farfalla-bypass-crawler-detectionMonitoring tools (PageSpeed Insights, Oh Dear)
X-CustomFenice-Tenant-IdFenice desktop app (uses custom User-Agent)

These headers are checked in RequestMacrosProvider which adds the isCrawler() macro to the Request class.

SEO Impact

None. Crawlers don't need search functionality to index content:

  • Content pages are directly accessible via URLs and internal links
  • Search results are dynamic and not meant for indexing anyway
  • Recommendations are personalized and would vary per-request

Google and other search engines discover content by following links, not by using site search. The actual content (issues, collections, etc.) remains fully indexable.

FilePurpose
TenantResolverUnifiedQueryServiceResolves feature availability when tenant context is established
app/Traits/CanAccessFeature.phpFeature resolution with crawler check
app/Providers/RequestMacrosProvider.phpisCrawler() macro and bypass logic
X

Graph View