Why Your Kotlin API Needs Indexes: The Theory Behind findById and Lazy Loading

The Real Problem: Slow API in Production

Imagine the following scenario: your e-commerce API works perfectly in the development environment with 50 products. After deploying to production with 1 million items, the listing endpoint that took 200ms now takes 15 seconds or, worse, results in a Timeout.

Often, the blame isn’t on the language or the server, but on the lack of understanding about how data is being retrieved. In the real world, a poorly planned findAll() is the digital equivalent of trying to find a needle in a haystack by removing each straw manually.

The Diagnosis: What sets a senior developer apart is the ability to understand how the code will behave when data grows. This is called thinking in Big O - a way to measure algorithm efficiency.

The Core Concept: Indexes and Big O

To optimize APIs, we need to apply the fundamental principles of data structures to our persistence models.

findById and O(1) Search

In chapter 5 of “Understanding Algorithms”, Hash Tables are presented as the “holy grail” of speed: instant access. When we index a primary key in the database, we create a structure (usually a B-Tree or Hash Index) that allows the database engine to jump directly to the record.

Without Index: Linear Scan — O(n). The database needs to check each record until it finds what you’re looking for.

With Index: Constant/Logarithmic Search — O(1) or O(log n). The database goes directly to the record or performs a very fast search.

The N+1 Problem: The Silent Enemy

Lazy Loading is a technique that saves memory, but if used inside loops, it can cause a serious problem called N+1.

What happens: You make 1 query to fetch a list of orders. But when you try to access each order’s customer, JPA makes a new query for each order. If you have 100 orders, this becomes 101 queries (1 for the list + 100 for customers). This is extremely slow!

Where and How to Apply

Best Practices

Use Indexes (@Index): On columns frequently used in WHERE clauses, not just the primary key. If you always search by email or status, create an index!

Use Lazy Loading by default: To avoid loading huge object graphs that you won’t use.

Use Join Fetch whenever you know: In advance, that you’ll need relationship data for a list.

Anti-patterns

Doing findAll().filter { ... } in Kotlin: You’re bringing O(n) data into memory to do work the database would do in O(log n). Do the filtering in the database!

Leaving OneToMany relationships as Eager by default: This causes a “cascade effect” of memory loading, bringing data you may not need.

Practical Implementation: Kotlin/JPA

See the difference between an implementation that “chokes” the API and the high-performance solution.

The N+1 Trap (Inefficient)

// Standard Repository
interface OrderRepository : JpaRepository<Order, Long>

// Inefficient Service
@Transactional(readOnly = true)
fun listOrdersForDTO(): List<OrderDTO> {
    val orders = orderRepository.findAll() // 1 Query: SELECT * FROM order
    
    return orders.map { order ->
        // Here's the disaster: for each order, a new database query is fired
        // Total queries: 1 (list) + N (customers) = N+1 queries!
        OrderDTO(order.id, order.customer.name) 
    }
}

The problem: If you have 100 orders, this generates 101 database queries (1 for the list + 100 for each customer). This is extremely slow!

The Optimized Solution

interface OrderRepository : JpaRepository<Order, Long> {
    
    // We transform N+1 queries into just 1 using JOIN FETCH
    @Query("SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status")
    fun findAllWithCustomer(status: OrderStatus): List<Order>
}

// Usage in Service
fun listOrdersOptimized(status: OrderStatus): List<OrderDTO> {
    return orderRepository.findAllWithCustomer(status).map { order ->
        OrderDTO(order.id, order.customer.name) // Customer is already in memory!
    }
}

What the code does: The JOIN FETCH instructs Hibernate to perform an INNER JOIN (or LEFT JOIN) in SQL, bringing Customer data in the same database round-trip. We go from linear network behavior to constant connection time.

The Projection Solution (Even Better)

When you only need specific fields, you can avoid loading the entire entity and return the DTO directly from the query. This offers better performance and less risk of N+1 on other fields:

interface OrderRepository : JpaRepository<Order, Long> {
    
    // Projection: returns only necessary fields, without loading complete entities
    @Query("""
        SELECT new com.yourpackage.OrderDTO(o.id, c.name, o.status)
        FROM Order o
        JOIN o.customer c
        WHERE o.status = :status
    """)
    fun findDTOByStatus(status: OrderStatus): List<OrderDTO>
}

// Usage in Service - even simpler!
fun listOrdersWithProjection(status: OrderStatus): List<OrderDTO> {
    return orderRepository.findDTOByStatus(status) // Already returns the DTO ready!
}

Projection Advantages:

Less memory: Loads only the fields you need
Faster: Less data traveling from database to application
No N+1 risk: No entities to accidentally trigger lazy loading
Cleaner code: The DTO comes ready from the query

Complete Example with Entities

@Entity
@Table(name = "orders", indexes = [
    Index(name = "idx_order_status", columnList = "status"),
    Index(name = "idx_order_date", columnList = "created_at")
])
data class Order(
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    val id: Long = 0,
    
    val status: OrderStatus,
    
    @ManyToOne(fetch = FetchType.LAZY) // Lazy by default
    @JoinColumn(name = "customer_id")
    val customer: Customer,
    
    @OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
    val items: List<OrderItem> = emptyList()
)

@Entity
@Table(name = "customers", indexes = [
    Index(name = "idx_customer_email", columnList = "email", unique = true)
])
data class Customer(
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    val id: Long = 0,
    
    val name: String,
    
    @Column(unique = true)
    val email: String
)

// DTO for response
data class OrderDTO(
    val id: Long,
    val customerName: String,
    val status: String
)

Tradeoff Analysis

Strategy Comparison

Strategy	I/O Performance	Memory Consumption	Complexity
Lazy Loading	Bad (Many trips)	Low (Loads on demand)	O(N+1) in loops
Eager/Join Fetch	Great (One trip)	High (Loads extra objects)	O(1) round-trip
Projection	Excellent (One trip)	Very Low (Only necessary fields)	O(1) round-trip
Indexes	Excellent	Medium (Occupies disk/RAM)	O(log n) search

Critical Analysis

Memory vs. Processing: Indexes greatly speed up reads, but slow down writes a bit (because it needs to update the index) and take up space. Use indexes on columns you search frequently and that have many different values.

Think Like a Graph: Your entities are like connected points. The best path is to avoid fetching unnecessary data. If you only need the customer’s name, use Projection instead of loading the complete entity.

When to Use Each Approach:

Lazy Loading: When you don’t know if you’ll need the relationships.
Join Fetch: When you know you’ll need complete relationships for a list.
Projection: When you only need specific fields (best option for listings and DTOs).

Key Takeaways

The database is a search expert: Delegate filters and searches to it via indexes. Don’t bring everything into memory and filter in code.
N+1 is the silent enemy: Always monitor SQL logs in development to ensure your list isn’t firing hundreds of queries. Use spring.jpa.show-sql=true to see all queries.
Understand the Graph: Your entities are nodes in a graph. Load only the branches needed for the current operation.
Indexes are investments: They take up space and make writes slower, but drastically speed up reads. Use wisely.
Final Verdict: Frameworks like Spring and Hibernate make life much easier, but the laws of computing still apply. Optimizing your API is applying in practice what we learn about efficient algorithms.

This article is part of the “Understanding Algorithms” series, based on the book by Aditya Y. Bhargava.

Série: Understanding Algorithms