The Illusion of Personalization: Solving the LLM Cost Explosion in Fashion AI
How we cut O(N) API calls to an O(K) Hybrid Architecture with MiroFish V3.2
Virtual Try-On and AI styling present a unique infrastructure challenge: The Hyper-Personalization Trilemma. You want to generate 3 daily "Outfit of the Day" (OOTD) recommendations for 10,000 active users, but routing 10,000 unique stylistic prompts to an LLM like Qwen3.5-Flash every single day results in immediate API bankruptcy. The time complexity of generating these items directly scales at (where N is your user base).
As traffic grows, relying purely on raw LLM generation is financial suicide. In our recent MiroFish V3.2 backend overhaul, the engineering team set a strict mandate: maintain 1:1 bespoke curation quality while drastically reducing the time complexity of our API calls from to โwhere represents a tightly bound, finite set of stylistic clusters.
Hereโs how we architected the "Illusion of Personalization".
๐ Total Architectural Separation (Phase A vs. Phase B)
Before V3.2, every time a user opened the app, we queried Supabase for their style preferences, packed it into an LLM prompt, and waited for Qwen to return a JSON array of outfits. This was slow and expensive.
We resolved this by physically separating the generation engine from the serving layer.
The N+1 Query Defense
We begin batch operations by fetching the user metadata in a single, bulk query. We extract the user's Style DNA, their JSONB Rules (likes/dislikes), and their Timezone at the start of our nightly batch job. This prevents the classic N+1 database trap that plagues early-stage AI wrappers.
Phase A: Deterministic Clustering (O(K))
Instead of LLM calls, we group users into deterministic visual buckets. A user's profile is hashed and mapped to a Cluster ID, for example, 20260420_Minimal_Sunny.
For 10,000 users, our algorithm typically yields exactly 30 clusters.
Therefore, the LLM is called exactly 30 times. Each API call generates a dense "cache pool" of 20 high-quality outfit configurations (10 Daytime, 10 NightOut) mapped to that cluster. These are stored in a fast in-memory Redis layer.
Phase B: The Serving Layer
Inside the actual user request loopโthe moment they open the appโthere are ZERO LLM calls. Everything they see is pulled directly from the 20-item cache pool. The magic relies entirely on what happens between retrieving the pool and painting the UI.
โ๏ธ The Zero-Cost "Last-Mile" Personalization Pipeline
If 300 users share the identical Minimal_Sunny cluster, how do we prevent them from seeing the exact same app? We create the "Illusion of 1:1 Personalization" using a lightning-fast 5-step Python/Node pipeline without ever hitting the LLM again.
1. Anticipatory Time Filter
We sync the app to the user's device IANA timezone. The pipeline automatically shifts priority to "NightOut" items at exactly 15:00 (3 PM)โthe psychological moment when office workers begin planning their evening activities and are most susceptible to conversion.
2. Hard Filter (JSONB Veto)
We reject cluster items immediately based on the user's explicit style_rules.disliked_categories. For instance, if the user vetoes skirts, they are dropped.
To prevent UI crashing, we implement a Veto Cascade: if a strict veto ruleset accidentally reduces the cache pool to 0 items, the cascade gracefully rolls back the strictest filters to guarantee the UI always paints.
3. Weighted Scoring & Soft Re-Rank
Items receive deterministic nudges. We apply a multiplier for preferred_colors and for liked_categories. To handle stylistic cohesion, we use an ultra-lightweight 32D Cosine Proxy. Rather than loading heavy multi-gigabyte models or numpy tensors on edge nodes, we boil down embeddings into just three core dimensions (Vibe, Trend, Dare) for millisecond re-ranking.
4. 1e-9 Jitter (Tie-breaker)
When applying soft rules to a restricted cache, many outfits end up with identical scores (e.g., exactly 4.5). Database sorts on ties are non-deterministic and can cause items to visually "flicker" on re-renders, breaking the illusion. We inject an infinitesimal jitter to ensure absolute hierarchy without disrupting rank logic:
import random def score_outfit(item, user_prefs): base_score = apply_soft_rules(item, user_prefs) # The 1e-9 Jitter ensures perfect, stable shuffles for ties jitter = random.random() * 1e-9 return base_score + jitter
5. Hyper-Local UI Injection
Finally, we dynamically synthesize the user's location into the UI text. A generic "Here are your outfits" becomes: "A breezy minimalist look for your afternoon in Torrance." We call this the fortune-cookie effect. It requires zero AI compute, yet it bridges the psychological gap, making the generic cluster feel bespoke.
Try out the Last-Mile Pipeline simulator below:
Breaks sorting ties cleanly without database roundtrips.
A breezy minimalist look for your Afternoon in Torrance
Live Re-Ranking Pool (0 LLM Calls)
Beige Silk Midi Skirt
Navy Oversized Blazer
White Linen Wide Pants
Black Velvet Slip Dress
Red Leather Mini Skirt
โก Implicit Feedback (Why DB Triggers beat Edge Functions)
The system is only as good as its implicit feedback loop. We needed a frictionless way to extract preferences from user "Likes" to continually refine the JSONB Rules bypassing the "Cold Start" problem.
Trade-off Analysis:
Initially, we routed Like events out to Serverless Edge Functions that updated the Postgres instance. We quickly rejected this. Edge functions suffer from 50-300ms HTTP overhead and cold starts. More critically, putting mutation logic on edge functions meant writing complex manual retry logic if the DB transaction failed.
The Winner: We moved this directly to the database layer using Native PostgreSQL Triggers and PL/pgSQL. It executes with absolute atomicity within ~2ms.
When an outfit is liked, the database instantly calculates category affinities and upserts the Style DNA JSONB column natively:
CREATE OR REPLACE FUNCTION update_user_style_dna() RETURNS TRIGGER AS $$ BEGIN -- Partial deep merge into JSONB using the || operator UPDATE users SET style_dna = COALESCE(style_dna, '{}'::jsonb) || jsonb_build_object( 'preferred_colors', array_to_json(ARRAY( SELECT DISTINCT elements FROM ( SELECT jsonb_array_elements_text(COALESCE(style_dna->'preferred_colors', '[]'::jsonb)) AS elements UNION SELECT NEW.item_color AS elements ) AS unique_colors )) ) WHERE id = NEW.user_id; RETURN NEW; END; $$ LANGUAGE plpgsql;
๐ง Conclusion: The SmartWorkLab Engineering Philosophy
True engineering isn't just throwing brute-force LLM compute at a problem until your AWS bill catches fire. It's about data pipelines, strategic clustering, and exploiting psychological UX timing.
By dropping API dependencies and mastering the Last-Mile Pipeline, we didn't just solve an API "Cost Explosion"โwe architected a framework that lets us scale hyper-personalization infinitely.
Updated 4/20/2026