What did you think?
Sign in to react.
What did you think?
Sign in to react.
Same query. Same parameters. Different results every time.
I stared at my terminal, refreshing the API response for the 10th time. Completely random results.
# First call - perfect ordering ✅
[Order #142, #139, #137, #128, #124...]
# Refresh - wait, what? 🤨
[Order #128, #142, #124, #137, #139...]
# Refresh again - seriously? 😤
[The sorting wasn’t just broken. It was aggressively random.
The bug report seemed innocent enough: “Sorting works fine with deduplicate=false, but returns random order with deduplicate=true.”
Two identical API calls, same sort parameter (created_at:desc), but completely different results:
# Works perfectly - sorted by created_at descending ✅
curl "http://api.example.com/orders?user_id=123&sort=created_at:desc&deduplicate=false"
# Response: [#142 (10:00), #139 (09:45), #137 (09:30)...] - Perfect!
# Returns random order - WTF? 🔥
curl "http://api.example.com/orders?user_id=123&sort=created_at:desc&deduplicate=true"
# Response: [#137 (09:30), #142 (10:00), #139 (09:45)...] - Chaos!The cost?
The kind of bug that makes you question your entire database architecture at 2 AM.
The twist? When deduplicate=true, we use MongoDB aggregation with $group to get only the latest order per customer. When false, we use a simple query with .order_by().
“But I’m sorting the data before grouping! This should work!”
— Me, 3 hours into debugging, coffee getting cold
Fighting this bug right now? Jump to The Complete Fix (2-minute read)
Want to understand why it happens? Keep reading for the full debugging story (10-minute read)
Here’s the code that looked perfectly innocent (spoiler: it wasn’t):
# Original broken pipeline
pipeline = [
{"$match": {"user_id": user_id}},
{"$sort": {"created_at": -1}}, # Sort by time DESC - looks good!
{
"$group": {
"_id": "$customer_id",
"doc": {"$first": "$$ROOT"} # Take first document per customer
}
},
{"$replaceRoot": {"newRoot": "$doc"}},
# Missing second sort here! 🚨 THIS IS THE BUG
{"$skip": skip},
{"$limit": limit}
]Look at that pipeline. It’s clean. It’s logical. Sort the data, group by customer_id taking the first (newest) order per customer, done.
I reviewed it 5 times. Showed it to 3 colleagues. Everyone nodded. “Looks fine to me.”
But MongoDB had other plans.
After checking indexes, query execution plans, and questioning my entire career choice, I found it buried in MongoDB’s documentation:
The
$groupstage does NOT preserve the sort order of its output documents.
Read that again. Slowly.
Documents go INTO $group sorted. They come OUT in random order.
It’s like carefully organizing books by date, putting them through a scanner, and having them come out shuffled. The scanner doesn’t care about your order.
$group throws away your sort order. Always. No exceptions.
Don’t just take my word for it. Copy this test and see if you have the bug:
"""
Mock MongoDB aggregation test - demonstrates the sorting bug
"""
from pymongo import MongoClient
from datetime import datetime, timedelta
import random
# Setup in-memory MongoDB simulation
def setup_mock_database():
"""Create test data that reproduces the bug"""
base_time = datetime(2026, 3, 15, 10, 0, 0)
Got random results? You have the bug. Keep reading for the fix.
Got consistent results? You might still be vulnerable. Here’s why.
Stop losing users to inconsistent results. This counter-intuitive fix saves the day.
You’re not sorting twice because you forgot the first time.
You’re sorting twice because they do different jobs:
Here’s the mental model that finally clicked:
Step 1: $match
→ 6 documents (user_id=123)
Step 2: $sort {created_at: -1}
→ Documents sorted: [Alice=10:00, Charlie=09:45, Bob=09:30, Alice=09:00, Bob=07:30, Alice=08:00]
💬 "Perfect! Everything is sorted now..."
Step 3: $group by customer_id, take $first
✓ Alice's first: 10:00 (correct - it was first in sorted list)
✓ Bob's first: 09:30 (correct - it was first for Bob)
✓ Charlie's first: 09:45 (correct - only one)
💬 "$first picked the right documents... we're golden!"
Step 4: $replaceRoot
→ 3 documents: [Alice=10:00, Bob=09:30, Charlie=09:45]
🚨 PLOT TWIST: $group output is NOT SORTED!
🎲 Order could be: [Bob=09:30, Alice=10:00, Charlie=09:45] (random!)
🎲 Or: [Charlie=09:45, Alice=10:00, Bob=09:30]
🎲 Or any other permutation MongoDB feels like returning
Step 5: $limit 10
→ Returns WRONG ORDER to client
→ Users refresh and see different results
→ Support tickets flood in
→ Your weekend is ruinedThe first sort worked perfectly for its job (making $first pick correctly). But $group threw away the order.
Step 1: $match
→ 6 documents
Step 2: $sort {created_at: -1} ← SORT #1 (For $first accuracy)
→ Documents sorted
💬 "Sort #1 job: Make sure $first picks the right document"
Step 3: $group by customer_id, take $first
✓ Groups see sorted data, $first is correct
Just when you think you’ve fixed it, production throws you a curveball.
The scenario: Multiple orders created at exactly the same second. Database bulk imports. Race conditions. Batch processing. It happens more often than you’d think.
The result: Even with your two sorts, the order is still unstable.
def test_duplicate_timestamps():
"""What happens when orders share the exact same timestamp?"""
# Real scenario: Bulk import creates these simultaneously
orders = [
{"_id": 1, "customer_id": "Alice", "created_at": datetime(2026, 3, 15, 10, 0, 0), "amount": 100},
{"_id":
Pro Tip: When primary sort fields can tie, MongoDB picks order arbitrarily. Add
_idas the tiebreaker for deterministic results.
Solution: Always add _id as a tiebreaker (but respect user preferences if they already specified it):
# Build sort query from user input
sort_query = {}
for order in sort_fields:
if order.startswith('-'):
sort_query[order
Why the check? If a user explicitly requests sort=created_at:desc,_id:asc, respect their choice. Don’t silently override it. API consumers hate surprises.
After 6 hours of debugging, 47 test runs, and way too much coffee, here’s the battle-tested solution:
def get_orders_with_deduplication(filter_params):
"""Get orders with customer_id deduplication and proper sorting"""
# Parse sort params from user input
sort_params = filter_params.get('sort')
sort_fields = parse_sort_params(sort_params) if sort_params else ['-created_at']
# Build sort query
sort_query = {}
for field in sort_fields
Copy this code. Test it. Ship it. Sleep peacefully.
Next steps:
If you remember nothing else from this post, remember these:
Re-sort after grouping. Always. This isn’t a bug. It’s how MongoDB works. But it trips up even experienced developers.
Skip either one and you’re gambling with user experience.
When timestamps match, MongoDB returns random order. Adding _id to your sort ensures the same results every time.
Check if _id is already in the sort query before adding it. If a user wants _id:asc, don’t override it to _id:desc.
This bug only surfaces with:
Test data with unique values won’t catch these edge cases.
Shareable insight: “In MongoDB aggregation, $group is a memory eraser for sort order. Always sort before (for accuracy) and after (for consistency).”
Verification is everything. Here’s how to prove your fix works:
def test_sorting_stability():
"""The ultimate test: Run identical queries and verify consistency"""
results_runs = []
# Run same query 3 times
for i in range(3):
results = get_orders(
user_id=123,
deduplicate=True,
sort="created_at:desc"
)
# Extract IDs to compare ordering
ids = [r['_id'] for r in results]
results_runs.append(ids)
print(f"Run {i+1}: {len(results)} orders")
print(f" First 5 IDs: {ids[:5]}")
# The moment of truth
assert results_runs[0] == results_runs[1] == results_runs[2], \
"Sorting must be stable across multiple runs!"
print("\n✅ STABLE - All 3 runs returned identical order")
print(" You can ship this to production.")What to test:
This 6-hour debugging marathon taught me more than I wanted to know:
Each aggregation stage can reorder data. Read the docs for every stage. Assumptions will burn you.
Our test data had unique timestamps. Production had bulk imports creating 50 orders in the same millisecond. Edge cases are production cases.
One user reported: “Every refresh shows different orders.” Within 24 hours, support had 18 similar tickets. Users assume bugs mean data corruption. Trust evaporates.
# Sort after $group to fix output order
{"$sort": sort_query} # Weak comment
# CRITICAL: $group doesn't preserve sort order. This second sort
# ensures final results match user expectations. DO NOT REMOVE.
{"$sort": sort_query} # Explains the trapFuture you needs context.
That’s the job.
Have you hit this bug? You’re not alone. $group has tripped up developers for years.
Check your code right now:
Share this with your team before they spend 6 hours debugging the same issue.
“This post saved me 4 hours of debugging. The double-sort explanation finally made $group behavior click.” — Sarah Chen, Senior Backend Engineer
“We had this exact bug in production. Fixed in 5 minutes after reading this.” — Mike Torres, Database Architect
_id mattersFound this helpful?
Share it with your team. Save them 6 hours.
Hit the same bug?
You’re not alone. This has burned thousands of developers.
Key Insight: Sort #1 ensures
$firstpicks correctly within each group. Sort #2 ensures the groups themselves are in the right order.Two sorts. Two jobs. Both essential.
Quick Progress Check:
Bookmark this page so you don’t lose it when production breaks at 2 AM.
Production bug - Same query, random results every time. Learn the counter-intuitive double-sort fix that saved our team 18 support tickets in 24 hours