Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
37b8aa3
[HSC-406] feat: ํšŒ์›๊ฐ€์ž…์‹œ ์›ฐ์ปด์ฟ ํฐ ์ฃผ๋Š” ๊ธฐ๋Šฅ ์ถ”๊ฐ€
bon0512 Mar 22, 2026
ddbf4d1
[HSC-406] feat: ์ฟ ํฐ ๋ฐœ๊ธ‰ํ•ด์ฃผ๋Š” ์„œ๋น„์Šค๋”ฐ๋กœ ๋ถ„๋ฆฌ
bon0512 Mar 22, 2026
a58336a
[HSC-406] feat: ํšŒ์›๊ฐ€์ž… ์ฟ ํฐ ์„œ๋น„์Šค ์ถ”๊ฐ€
bon0512 Mar 22, 2026
05c8008
[HSC-406] fix: ํ…Œ์ŠคํŠธ ์ฝ”๋“œ ์ˆ˜์ •
bon0512 Mar 22, 2026
955413a
[HSC-406] feat: ์ฟ ํฐ ๊ณต๋™๋ฐœ๊ธ‰ ๋กœ์ง ์ถ”๊ฐ€
bon0512 Mar 22, 2026
822162b
[HSC-406] feat: ์ฟ ํฐ ํŒจํ‚ค์ง€ ์ถ”๊ฐ€
bon0512 Mar 22, 2026
8521c43
[HSC-404] remove: gitkeep ํŒŒ์ผ ์ผ๊ด„ ์‚ญ์ œ
tkv00 Mar 22, 2026
2330767
[HSC-406] fix: ํŠธ๋žœ์žญ์…”๋„ ์ถ”๊ฐ€
bon0512 Mar 22, 2026
b28a236
[HSC-406] feat: ๋™์‹œ์„ฑ ํ…Œ์ŠคํŠธ์ฝ”๋“œ ์ถ”๊ฐ€
bon0512 Mar 22, 2026
1280d6c
[HSC-404] refactor: ๋„๋ฉ”์ธ ์˜ˆ์™ธ ๊ตฌ์กฐ ๋„์ž…
tkv00 Mar 23, 2026
25664bb
[HSC-404] feat: ์ธ์ฆ ์˜ˆ์™ธ ์ฒ˜๋ฆฌ ์ค‘์•™ํ™”
tkv00 Mar 23, 2026
6040a9b
[HSC-404] feat: ์š”์ฒญ ๊ฒ€์ฆ๊ณผ DB ์ถฉ๋Œ ์‘๋‹ต ๊ฐœ์„ 
tkv00 Mar 23, 2026
d8037e4
[HSC-404] test: ์˜ˆ์™ธ์ฒ˜๋ฆฌ ๋ณ€๊ฒฝ์— ๋”ฐ๋ฅธ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ ์ผ๊ด„ ์ˆ˜์ •
tkv00 Mar 23, 2026
2888960
[HSC-404] feat: ์ถ”์ฒœ ์„œ๋น„์Šค ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์œ„ํ•œ ์ง€ํ‘œ ์ถ”๊ฐ€
tkv00 Mar 23, 2026
1476271
Merge pull request #263 from one-year-gap/feat/HSC-406
rettooo Mar 23, 2026
c50327e
[HSC-404] fix: Test ์ฝ”๋“œ import ๋ฒ”์œ„ ์ˆ˜์ •
tkv00 Mar 23, 2026
6a55bb6
Merge branch 'dev' into refactor/HSC-404
tkv00 Mar 23, 2026
5b2449f
Merge branch 'dev' of https://github.com/one-year-gap/api-server intoโ€ฆ
tkv00 Mar 23, 2026
c3cc0a8
Merge remote-tracking branch 'origin/refactor/HSC-404' into refactor/โ€ฆ
tkv00 Mar 23, 2026
6377e23
[HSC-404] remove: python ๊ด€๋ จ ํŒŒ์ผ ์‚ญ์ œ
tkv00 Mar 23, 2026
02c21e3
Merge pull request #264 from one-year-gap/refactor/HSC-404
tkv00 Mar 23, 2026
18ff97a
Merge branch 'origin/main' into release/HSC-413
tkv00 Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,9 @@
<br/><br/>
<!-- ๊ตฌ๋ถ„์„  -->
<img src="https://capsule-render.vercel.app/api?type=rect&color=gradient&customColorList=12,16,20&height=3&section=header" width="100%"/>

## Monitoring

- Prometheus endpoint: `/actuator/prometheus`
- Grafana dashboard template: `monitoring/grafana/holliverse-customer-observability-dashboard.json`
- Monitoring notes: `monitoring/grafana/README.md`
55 changes: 55 additions & 0 deletions monitoring/grafana/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Grafana Monitoring

์ด ๋””๋ ‰ํ„ฐ๋ฆฌ๋Š” ์•ฑ์—์„œ ๋…ธ์ถœํ•˜๋Š” Prometheus ๋ฉ”ํŠธ๋ฆญ๊ณผ Grafana ๋Œ€์‹œ๋ณด๋“œ ์ดˆ์•ˆ์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

## ํฌํ•จ๋œ ๋Œ€์‹œ๋ณด๋“œ

- `holliverse-customer-observability-dashboard.json`
- ์ถ”์ฒœ API end-to-end ์‹œ๊ฐ„
- ์ถ”์ฒœ ๋Œ€๊ธฐ ์‹œ๊ฐ„
- ์ถ”์ฒœ pending future ์ˆ˜
- FastAPI trigger ๊ฒฐ๊ณผ
- user-log publish ์„ฑ๊ณต/์‹คํŒจ
- user-log ๋ฐฐ์น˜ ํฌ๊ธฐ
- admin internal log-feature ํ˜ธ์ถœ ์ง€์—ฐ
- ๋น„๋™๊ธฐ executor queue / active thread

## ์ƒˆ๋กœ ์ถ”๊ฐ€๋œ ๋ฉ”ํŠธ๋ฆญ

- `holliverse.recommendation.requests{outcome=*}`
- `holliverse.recommendation.duration{outcome=*,source=*}`
- `holliverse.recommendation.wait.duration{outcome=*}`
- `holliverse.recommendation.fastapi.trigger{status=*}`
- `holliverse.recommendation.kafka.consume.duration{outcome=*}`
- `holliverse.recommendation.pending.size`
- `holliverse.executor.pool.size{executor=*}`
- `holliverse.executor.active.count{executor=*}`
- `holliverse.executor.queue.size{executor=*}`
- `holliverse.executor.queue.remaining{executor=*}`
- `holliverse.userlog.publish{event_name=*,result=*}`
- `holliverse.userlog.batch.size`
- `holliverse.userlog.admin_log_feature.duration{result=*}`

## MSK Lag

MSK lag๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์ง์ ‘ ๋…ธ์ถœํ•˜๋Š” ๋ฉ”ํŠธ๋ฆญ์ด ์•„๋‹ˆ๋ผ AWS ์ธก ๋ฉ”ํŠธ๋ฆญ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Grafana์—์„œ๋Š” AWS CloudWatch datasource ๋˜๋Š” Prometheus๋กœ ์ˆ˜์ง‘ํ•œ exporter๋ฅผ ํ†ตํ•ด ๋ณ„๋„ ํŒจ๋„๋กœ ๋ถ™์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์šฐ์„  ๋ด์•ผ ํ•  ๋ฉ”ํŠธ๋ฆญ:

- `SumOffsetLag`
- `MaxOffsetLag`
- `EstimatedTimeLag`
- `EstimatedMaxTimeLag`

๊ถŒ์žฅ ํŒจ๋„:

- consumer group๋ณ„ `SumOffsetLag`
- consumer group๋ณ„ `MaxOffsetLag`
- partition top N lag
- lag์™€ ๊ฐ™์€ ํ™”๋ฉด์— `holliverse_executor_queue_size`, `holliverse_recommendation_pending_size`, DB pool, JVM, HTTP p95๋ฅผ ๊ฐ™์ด ๋ฐฐ์น˜

## ์šด์˜ ๊ธฐ์ค€

- ์ถ”์ฒœ API๋Š” `timeout`, `pending.size`, `executor.queue.size{executor="recommendation-trigger"}`๋ฅผ ํ•จ๊ป˜ ๋ด…๋‹ˆ๋‹ค.
- user-log๋Š” `publish{result!="success"}`์™€ `executor.queue.size{executor="user-log"}`๋ฅผ ํ•จ๊ป˜ ๋ด…๋‹ˆ๋‹ค.
- lag ๊ธฐ๋ฐ˜ ์˜คํ† ์Šค์ผ€์ผ๋ง์€ Grafana๊ฐ€ ์•„๋‹ˆ๋ผ CloudWatch Alarm + ECS/EKS ์˜คํ† ์Šค์ผ€์ผ๋Ÿฌ๊ฐ€ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
Loading
Loading