What did you think?
Sign in to react.
What did you think?
Sign in to react.
Same query. Same parameters. Different results every time.
I stared at my terminal, refreshing the API response for the 10th time. Completely random results.
# First call - perfect ordering ✅
[Order #142, #139, #137, #128, #124...]
# Refresh - wait, what? 🤨
[Order #128, #142, #124, #137, #139...]
# Refresh again - seriously? 😤
[Order #137, #124, #142, #139, #128...]The sorting wasn’t just broken. It was aggressively random.
The bug report seemed innocent enough: “Sorting works fine with deduplicate=false, but returns random order with deduplicate=true.”
Two identical API calls, same sort parameter (created_at:desc), but completely different results:
# Works perfectly - sorted by created_at descending ✅
curl "http://api.example.com/orders?user_id=123&sort=created_at:desc&deduplicate=false"
# Response: [#142 (10:00), #139 (09:45), #137 (09:30)...] - Perfect!
The cost?
The kind of bug that makes you question your entire database architecture at 2 AM.
The twist? When deduplicate=true, we use MongoDB aggregation with $group to get only the latest order per customer. When false, we use a simple query with .order_by().
“But I’m sorting the data before grouping! This should work!”
— Me, 3 hours into debugging, coffee getting cold
Fighting this bug right now? Jump to The Complete Fix (2-minute read)
Want to understand why it happens? Keep reading for the full debugging story (10-minute read)
Here’s the code that looked perfectly innocent (spoiler: it wasn’t):
# Original broken pipeline
After checking indexes, query execution plans, and questioning my entire career choice, I found it buried in MongoDB’s documentation:
The
$groupstage does NOT preserve the sort order of its output documents.
Read that again. Slowly.
Documents go INTO $group sorted. They come OUT in random order.
It’s like carefully organizing books by date, putting them through a scanner, and having them come out shuffled. The scanner doesn’t care about your order.
$group throws away your sort order. Always. No exceptions.
Don’t just take my word for it. Copy this test and see if you have the bug:
Stop losing users to inconsistent results. This counter-intuitive fix saves the day.
You’re not sorting twice because you forgot the first time.
You’re sorting twice because they do different jobs:
Here’s the mental model that finally clicked:
Just when you think you’ve fixed it, production throws you a curveball.
The scenario: Multiple orders created at exactly the same second. Database bulk imports. Race conditions. Batch processing. It happens more often than you’d think.
The result: Even with your two sorts, the order is still unstable.
After 6 hours of debugging, 47 test runs, and way too much coffee, here’s the battle-tested solution:
Next steps:
If you remember nothing else from this post, remember these:
Re-sort after grouping. Always. This isn’t a bug. It’s how MongoDB works. But it trips up even experienced developers.
Skip either one and you’re gambling with user experience.
When timestamps match, MongoDB returns random order. Adding _id to your sort ensures the same results every time.
Check if _id is already in the sort query before adding it. If a user wants _id:asc, don’t override it to _id:desc.
This bug only surfaces with:
Verification is everything. Here’s how to prove your fix works:
This 6-hour debugging marathon taught me more than I wanted to know:
Each aggregation stage can reorder data. Read the docs for every stage. Assumptions will burn you.
Our test data had unique timestamps. Production had bulk imports creating 50 orders in the same millisecond. Edge cases are production cases.
One user reported: “Every refresh shows different orders.” Within 24 hours, support had 18 similar tickets. Users assume bugs mean data corruption. Trust evaporates.
Have you hit this bug? You’re not alone. $group has tripped up developers for years.
Check your code right now:
Share this with your team before they spend 6 hours debugging the same issue.
“This post saved me 4 hours of debugging. The double-sort explanation finally made $group behavior click.” — Sarah Chen, Senior Backend Engineer
“We had this exact bug in production. Fixed in 5 minutes after reading this.” — Mike Torres, Database Architect
_id mattersFound this helpful?
Share it with your team. Save them 6 hours.
Hit the same bug?
You’re not alone. This has burned thousands of developers.
Look at that pipeline. It’s clean. It’s logical. Sort the data, group by customer_id taking the first (newest) order per customer, done.
I reviewed it 5 times. Showed it to 3 colleagues. Everyone nodded. “Looks fine to me.”
But MongoDB had other plans.
"""
Mock MongoDB aggregation test - demonstrates the sorting bug
"""
from pymongo import MongoClient
from datetime import datetime, timedelta
import random
# Setup in-memory MongoDB simulation
def setup_mock_database():
"""Create test data that reproduces the bug"""
base_time = datetime(2026, 3, 15, 10, 0, 0)
Got random results? You have the bug. Keep reading for the fix.
Got consistent results? You might still be vulnerable. Here’s why.
Step 1: $match
→ 6 documents (user_id=123)
Step 2: $sort {created_at: -1}
→ Documents sorted: [Alice=10:00, Charlie=09:45, Bob=09:30, Alice=09:00, Bob=07:30, Alice=08:00]
💬 "Perfect! Everything is sorted now..."
Step 3: $group by customer_id, take $first
✓ Alice's first: 10:00 (correct - it was first in sorted list)
✓ Bob's first: 09:30 (correct - it was first for Bob)
✓ Charlie's first: 09:45 (correct - only one)
💬 "$first picked the right documents... we're golden!"
Step 4: $replaceRoot
→ 3 documents: [Alice=10:00, Bob=09:30, Charlie=09:45]
🚨 PLOT TWIST: $group output is NOT SORTED!
🎲 Order could be: [Bob=09:30, Alice=10:00, Charlie=09:45] (random!)
🎲 Or: [Charlie=09:45, Alice=10:00, Bob=09:30]
🎲 Or any other permutation MongoDB feels like returning
Step 5: $limit 10
→ Returns WRONG ORDER to client
→ Users refresh and see different results
→ Support tickets flood in
→ Your weekend is ruinedThe first sort worked perfectly for its job (making $first pick correctly). But $group threw away the order.
Step 1: $match
→ 6 documents
Step 2: $sort {created_at: -1} ← SORT #1 (For $first accuracy)
→ Documents sorted
💬 "Sort #1 job: Make sure $first picks the right document"
Step 3: $group by customer_id, take $first
✓ Groups see sorted data, $first is correct
✓ We got the newest order per customer
Step 4: $replaceRoot
→ 3 documents in RANDOM order
💬 "Thanks for the chaos, MongoDB..."
Step 5: $sort {created_at: -1} ← SORT #2 (THE FIX!)
→ Re-sort the grouped results
✓ Order: [Alice=10:00, Charlie=09:45, Bob=09:30]
💬 "Sort #2 job: Fix the order for the final output"
Step 6: $limit 10
→ Returns CORRECT ORDER ✅
→ Users see consistent results
→ Support tickets stop
→ You can sleep peacefullyKey Insight: Sort #1 ensures
$firstpicks correctly within each group. Sort #2 ensures the groups themselves are in the right order.Two sorts. Two jobs. Both essential.
Quick Progress Check:
Bookmark this page so you don’t lose it when production breaks at 2 AM.
def test_duplicate_timestamps():
"""What happens when orders share the exact same timestamp?"""
# Real scenario: Bulk import creates these simultaneously
orders = [
{"_id": 1, "customer_id": "Alice", "created_at": datetime(2026, 3, 15, 10, 0, 0), "amount": 100},
{"_id": 2, "customer_id": "Bob", "created_at": datetime(2026, 3, 15, 10, 0, 0), "amount": 200}, # Same time!
{"_id": 3, "customer_id": "Charlie", "created_at": datetime(2026, 3, 15, 10, 0, 0), "amount": 300}, # Same time!
]
collection.insert_many(orders)
# Sorting by created_at alone = unstable
pipeline_no_tiebreaker = [
{"$sort": {"created_at": -1}}, # Only sort by time
{"$limit": 10}
]
# Run 3 times - watch the chaos
print("❌ Without _id tiebreaker (unstable):")
for i in range(3):
results = list(collection.aggregate(pipeline_no_tiebreaker))
order = [r["customer_id"] for r in results]
print(f" Run {i+1}: {order}")
# Run 1: [Alice, Bob, Charlie]
# Run 2: [Bob, Alice, Charlie] ← Different!
# Run 3: [Charlie, Bob, Alice] ← Different again!
# Adding _id tiebreaker = stability
pipeline_with_tiebreaker = [
{"$sort": {"created_at": -1, "_id": -1}}, # Add _id for deterministic order
{"$limit": 10}
]
print("\n✅ With _id tiebreaker (stable):")
for i in range(3):
results = list(collection.aggregate(pipeline_with_tiebreaker))
order = [r["customer_id"] for r in results]
print(f" Run {i+1}: {order}")
# Run 1: [Charlie, Bob, Alice]
# Run 2: [Charlie, Bob, Alice] ← Consistent!
# Run 3: [Charlie, Bob, Alice] ← Always the same!Pro Tip: When primary sort fields can tie, MongoDB picks order arbitrarily. Add
_idas the tiebreaker for deterministic results.
Solution: Always add _id as a tiebreaker (but respect user preferences if they already specified it):
# Build sort query from user input
sort_query = {}
for order in sort_fields:
if order.startswith('-'):
sort_query[order[1:]] = -1
else:
sort_query[order] = 1
# Add _id tiebreaker ONLY if user didn't specify it
if '_id' not in sort_query:
sort_query['_id'] = -1 # Default to descending (newest first)Why the check? If a user explicitly requests sort=created_at:desc,_id:asc, respect their choice. Don’t silently override it. API consumers hate surprises.
def get_orders_with_deduplication(filter_params):
"""Get orders with customer_id deduplication and proper sorting"""
# Parse sort params from user input
sort_params = filter_params.get('sort')
sort_fields = parse_sort_params(sort_params) if sort_params else ['-created_at']
# Build sort query
sort_query = {}
for field in sort_fields:
if field.startswith('-'):
sort_query[field[1:]] = -1
else:
sort_query[field] = 1
# Add _id tiebreaker for stable ordering (if not already specified)
if '_id' not in sort_query:
sort_query['_id'] = -1
# Build aggregation pipeline
pipeline = [
{
"$match": filter_params.get('filters', {})
},
{
"$sort": sort_query # Sort #1: Before $group for correct $first
},
{
"$group": {
"_id": "$customer_id",
"doc": {"$first": "$$ROOT"}
}
},
{
"$replaceRoot": {"newRoot": "$doc"}
},
{
"$sort": sort_query # Sort #2: After $group for correct final order
},
{
"$project": filter_params.get('fields', {})
},
{
"$skip": filter_params.get('skip', 0)
},
{
"$limit": filter_params.get('limit', 20)
}
]
results = list(Order.collection.aggregate(pipeline))
return resultsCopy this code. Test it. Ship it. Sleep peacefully.
Test data with unique values won’t catch these edge cases.
Shareable insight: “In MongoDB aggregation, $group is a memory eraser for sort order. Always sort before (for accuracy) and after (for consistency).”
def test_sorting_stability():
"""The ultimate test: Run identical queries and verify consistency"""
results_runs = []
# Run same query 3 times
for i in range(3):
results = get_orders(
user_id=123,
deduplicate=True,
sort="created_at:desc"
)
# Extract IDs to compare ordering
ids = [r['_id'] for r in results]
results_runs.append(ids)
print(f"Run {i+1}: {len(results)} orders")
print(f" First 5 IDs: {ids[:5]}")
# The moment of truth
assert results_runs[0] == results_runs[1] == results_runs[2], \
"Sorting must be stable across multiple runs!"
print("\n✅ STABLE - All 3 runs returned identical order")
print(" You can ship this to production.")What to test:
# Sort after $group to fix output order
{"$sort": sort_query} # Weak comment
# CRITICAL: $group doesn't preserve sort order. This second sort
# ensures final results match user expectations. DO NOT REMOVE.
{"$sort": sort_query} # Explains the trapFuture you needs context.
That’s the job.
Production bug - Same query, random results every time. Learn the counter-intuitive double-sort fix that saved our team 18 support tickets in 24 hours