이건뭘까뭘까요

상품 목록을 조회할 때 필터 조건이나 정렬 조건을 걸면 성능이 저하되는 현상이 있습니다.

이런 현상이 있을 때 과도한 트래픽이 몰리면 선두차단현상으로 밀린 요청이 처리되지 않을 수 있습니다.

문제 상황

현재 인덱스는 걸려있지 않은 상황입니다. 테이블의 구조를 보면 다음과 같습니다.

데이터 분포는 다음과 같습니다.

테이블	개수
product	100만
brand	200
product_like	균등 5개 내외

product 테이블의 컬럼별 분포는 다음과 같습니다.

price	5000~50만
stock	0~999
state	AVAILABLE 80% UNAVAILABLE 20%

brand 테이블의 컬럼별 분포는 다음과 같습니다.

state

CLOSED 70% OPENED 30%

현재 애플리케이션 코드입니다.

    @Transactional(readOnly = true)
    fun search(userId: UserId, query: ProductQuery): Result {
        val productWithSignal = repository.search(query)
        val brands = brandRepository.findByIdIn(productWithSignal.map { it.product.brandId })
        val likes = likeRepository.findByUserIdAndProductIn(
            userId,
            productWithSignal.map { it.product },
        )
        return Result(products = factory.generate(productWithSignals = productWithSignal, brands = brands, likes = likes))
    }

각각 사용되는 쿼리 별 실행계획입니다.

loopers> explain analyze
         select *
         from product
                  right join product_signal
                             on product.id = product_signal.product_id
         where brand_id = 2001
           and state = 'AVAILABLE'
         order by product.price, product.displayed_at desc, product_signal.like_count
         limit 20 offset 40
[2025-08-15 20:47:24] 2 s 14 ms (execution: 1 s 698 ms, fetching: 316 ms)에서 1부터 1개 행을 불러왔습니다

loopers> explain analyze
         select *
         from brand
         where id = 2001
[2025-08-15 20:47:24] 343 ms (execution: 3 ms, fetching: 340 ms)에서 1부터 1개 행을 불러왔습니다

loopers> explain analyze
         select *
         from product_like
         where user_id = 1
           and product_id in (10000103, 10000104, 10000105, 10000106, 10000107, 10000108, 10000109)
           [2025-08-15 20:47:26] 1 s 569 ms (execution: 1 s 227 ms, fetching: 342 ms)에서 1부터 1개 행을 불러왔습니다

총 3.884초가 걸리는 것을 확인했습니다.
랜덤한 요청으로 vuser 1000으로 k6 테스트를 진행해본 결과

최소시간 3초로 실행계획과 수렴하는 것을 확인할 수 있었습니다.

성공률은 스레드풀 제한과 동일했습니다. ㅎㅎ

인덱스 적용

먼저 인덱스를 적용해봤습니다.

product와 product_signal의 경우 인덱스는 카디널리티가 높은 순으로, 필터링을 정렬보다 먼저 오도록 복합키를 걸었습니다.

총 product 수: 100만

----

총 brand 수: 200개 -> 브랜드당 product는 5000개, 전체의 0.5%

총 state는 비율이 30~70% 이므로 state당 product는 30만~70만개, 전체의 30%~70%

따라서 brand_id를 먼저 걸겠습니다.

필터조건의 경우

displayed_at과 price가 like count보다 높고,

현실세계에서 봤을 때 가격이 같을 확률이 전시시작일자가 같을 확률보다 높기 때문에

displayed_at을 desc로, price, like_count를 순차적으로 걸겠습니다.

CREATE INDEX idx_product_brand_state_disp_price
    ON product (brand_id, state, displayed_at DESC, price, id);
CREATE INDEX idx_ps_product_like ON product_signal (product_id, like_count);

product_like의 경우 카디널리티를 비교해보면

총 좋아요 수: 500만

-----

총 상품 수: 100만 -> 상품당 약 5개

총 사용자 수: 5만 -> 사용자당 약 100개

이므로 상품이 사용자보다 더 카디널리티가 높습니다. 따라서 product_id에 인덱스를 먼저 걸겠습니다.

인덱스 적용 후 실행계획

loopers> explain analyze
         select *
         from product
                  right join product_signal
                             on product.id = product_signal.product_id
         where brand_id = 2001
           and state = 'AVAILABLE'
         order by product.displayed_at desc, product.price, product_signal.like_count
         limit 20 offset 40
[2025-08-15 21:24:05] 99 ms (execution: 34 ms, fetching: 65 ms)에서 1부터 1개 행을 불러왔습니다
loopers> explain analyze
         select *
         from brand
         where id = 2001
[2025-08-15 21:24:05] 303 ms (execution: 5 ms, fetching: 298 ms)에서 1부터 1개 행을 불러왔습니다
loopers> explain analyze
         select *
         from product_like
         where user_id = 1
           and product_id in (10000103, 10000104, 10000105, 10000106, 10000107, 10000108, 10000109)
[2025-08-15 21:24:05] 316 ms (execution: 5 ms, fetching: 311 ms)에서 1부터 1개 행을 불러왔습니다

0.718s로 대폭 감소한 것을 확인할 수 있었습니다.

-> Limit/Offset: 20/40 row(s)  (actual time=19.2..19.2 rows=20 loops=1)
    -> Sort: product.displayed_at DESC, product.price, product_signal.like_count, limit input to 60 row(s) per chunk  (actual time=19.2..19.2 rows=60 loops=1)
        -> Stream results  (cost=15931 rows=7256) (actual time=0.444..17.8 rows=3964 loops=1)
            -> Nested loop inner join  (cost=15931 rows=7256) (actual time=0.432..14.3 rows=3964 loops=1)
                -> Index lookup on product using idx_product_brand_state_disp_price (brand_id=2001, state='AVAILABLE'), with index condition: ((product.state = 'AVAILABLE') and (product.id is not null))  (cost=7979 rows=7256) (actual time=0.404..6.39 rows=3964 loops=1)
                -> Single-row index lookup on product_signal using uq_ps_product (product_id=product.id)  (cost=0.996 rows=1) (actual time=0.00186..0.00188 rows=1 loops=3964)

실행계획도 순차적으로 인덱스가 잘 동작하는 것을 확인할 수 있었습니다.

인덱스 적용 후 부하테스트

최소 8.32ms, 최대 1.06초로 이전에 커넥션풀 획득을 하지 못해 실패비율이 높았던 것과 다르게

p95 183ms로 무난하게 소화하는 것을 확인할 수 있었습니다.

동시 접속자 수 300일 때도 조금 지연은 발생하지만 무난하게 성공하는 것을 확인할 수 있었습니다.

하지만 여전히 1초로 많이 지연되는 것을 확인할 수 있었습니다.

동시 접속자수 1000명일 경우 실패도 발생하고 응답시간도 6초로 개선이 필요해지는 것도 확인했습니다.

캐시로 조회성능을 개선해보자!

index로 최대한 성능을 보장했지만 아쉽게도 데이터베이스 커넥션 풀이 병목이되어 원활하게 처리되지 못한 것을 볼 수 있었습니다.

이를 조금 더 개선하기 위해 캐시를 적용해보겠습니다.

1. 상품 단건(id) 별로 캐싱

- 페이징쿼리에 맞는 id목록을 가져와야하기 때문에 쿼리별로 캐싱보다 데이터베이스 부하가 큽니다.

- 하지만 캐시 히트율이 키를 페이지, 검색요소로 쿼리했을 때보다 높다는 장점이 있습니다.

- 또한 상품 단일 조회시에도 해당 데이터를 사용할 수 있으므로 효율적입니다.

이후 캐시 웜업을 통해 1번의 성능을 보장시켜보겠습니다.

2. 쿼리별로 캐싱

- 데이터베이스를 다녀오지 않아도 되기 때문에 캐시 히트가 될 경우 좋은 성능이 보장됩니다. (랜딩페이지, 1,2,3페이지 등에 효과적)

- 조회율이 낮은 데이터도 캐싱해야합니다. 따라서 캐시 히트율이 적습니다.

- 신규 상품이 들어오거나 상품정보가 변경될경우 많은 데이터에 반영해야하거나, 전부 evict 시켜야 할 수 있습니다.

위와 같은 트레이드 오프가 있어 먼저 1번을 테스트해보고 2번을 시도하여 비교해보겠습니다.

1. 상품 단건 별로 캐싱

    @Transactional(readOnly = true)
    fun search(userId: UserId, query: ProductQuery): Result {
        val ids = repository.searchForIds(query)
            .map { ProductKey.GetProduct(it) }
        val found = productCacheTemplate.findAll(ids)
        if (found.size == ids.size) return Result(products = found)
        
        val productWithSignal = repository.search(query)
        val brands = brandRepository.findByIdIn(productWithSignal.map { it.product.brandId })
        val likes = likeRepository.findByUserIdAndProductIn(
            userId,
            productWithSignal.map { it.product },
        )
        val resultProducts = productCacheTemplate.saveAll(
            factory.generate(productWithSignals = productWithSignal, brands = brands, likes = likes),
        )
        return Result(products = resultProducts)
    }

변경된 애플리케이션 레이어 코드입니다.

쿼리로는 fetch join 등 불필요한 요소를 제외한 후 id만 가져오도록 했습니다.

만약 캐시에서 해당되는 상품들이 모두 존재한다면 그대로 반환합니다.

없다면 기존 로직을 수행하고 마지막에 조합하여 캐싱 후 반환합니다.

현재 쿼리 자체가 랜덤값을 기반으로 동작하기 때문에 hit율이 매우 저조합니다.

따라서 캐싱만 계속 진행하기 때문에 오히려 성능이 잘 나오지 않는 것을 볼 수 있습니다.