← Back to BlogCorrect category classification (suits, sport coats, outerwear, etc.) Measurement extraction from unstructured text Brand profiling for marketing copy Templated HTML generation 15,000+ items processed 5× throughput improvement $250K+ annual revenue enabled
Building an ETL Pipeline for 15,000+ Luxury Items
PythonETLAI/MLAutomation
The Problem
When you're processing thousands of luxury items for eBay, manual categorization becomes the bottleneck. Each item needs:
The Solution
I built an ETL pipeline with three core components:
1. Category Classification
Using sentence-transformer embeddings, items are classified into 60+ categories. When embeddings don't match with high confidence, RapidFuzz provides fuzzy matching fallback.
2. Measurement Parsing
20+ regex parsers extract measurements from free-text descriptions, normalizing them into structured JSON.
3. GPT Integration
For brand profiling, I integrated OpenAI's API with JSON caching to minimize costs while generating dynamic marketing copy.
Results
The key insight: AI doesn't replace human judgment—it handles the 95% of routine cases so humans can focus on edge cases.