← Back to Blog

Building an ETL Pipeline for 15,000+ Luxury Items

PythonETLAI/MLAutomation

The Problem

When you're processing thousands of luxury items for eBay, manual categorization becomes the bottleneck. Each item needs:

  • Correct category classification (suits, sport coats, outerwear, etc.)
  • Measurement extraction from unstructured text
  • Brand profiling for marketing copy
  • Templated HTML generation
  • The Solution

    I built an ETL pipeline with three core components:

    1. Category Classification

    Using sentence-transformer embeddings, items are classified into 60+ categories. When embeddings don't match with high confidence, RapidFuzz provides fuzzy matching fallback.

    2. Measurement Parsing

    20+ regex parsers extract measurements from free-text descriptions, normalizing them into structured JSON.

    3. GPT Integration

    For brand profiling, I integrated OpenAI's API with JSON caching to minimize costs while generating dynamic marketing copy.

    Results

  • 15,000+ items processed
  • 5× throughput improvement
  • $250K+ annual revenue enabled
  • The key insight: AI doesn't replace human judgment—it handles the 95% of routine cases so humans can focus on edge cases.