Skip to content

feat: 부하 테스트 자동화 구성#31

Open
lsy1307 wants to merge 11 commits intomainfrom
19-feat-loadtest-rds-parameter-store
Open

feat: 부하 테스트 자동화 구성#31
lsy1307 wants to merge 11 commits intomainfrom
19-feat-loadtest-rds-parameter-store

Conversation

@lsy1307
Copy link
Copy Markdown
Contributor

@lsy1307 lsy1307 commented May 5, 2026

관련 이슈

작업 내용

  • 부하 테스트 시작 시 Terraform으로 loadtest 전용 RDS를 생성하도록 구성했습니다.
  • prod/stage EC2 보안 그룹에서 loadtest RDS 3306 포트로 접근할 수 있도록 RDS 보안 그룹 규칙을 구성했습니다.
  • loadtest RDS endpoint/username/password를 Parameter Store의 /solid-connection/loadtest/ 경로에 반영하도록 구성했습니다.
  • Load Test Start, Load Test Stop GitHub Actions 수동 실행 workflow를 추가했습니다.
  • stage 서버 전환/원복을 SSH 대신 SSM RunCommand로 수행하도록 변경했습니다.
  • prod RDS 데이터 복사도 prod EC2 대상 SSM RunCommand로 수행합니다.
  • monitor repo의 k6 파일을 infra repo에 포함하고, stage EC2 신규 생성 시 cloud-init으로 배치하도록 구성했습니다.
  • secret submodule의 loadtest tfvars 변경 커밋을 상위 repo submodule pointer에 반영했습니다.

실행 프로세스

시작

  1. GitHub에서 Actions > Load Test Start를 엽니다.
  2. Run workflow를 클릭합니다.
  3. 기본값 그대로 실행합니다.

입력값:

  • switch_stage_to_loadtest: stage 앱을 dev,loadtest 프로필로 재기동합니다. 기본값 true
  • copy_prod_data: prod RDS 데이터를 loadtest RDS로 복사합니다. 기본값 true

Start workflow가 수행하는 작업:

  • AWS_ROLE_ARN을 OIDC로 assume합니다.
  • GH_PAT로 secret submodule을 checkout합니다.
  • environment/load_test에서 Terraform init/apply를 실행합니다.
  • loadtest RDS와 보안 그룹, Parameter Store datasource 값을 생성합니다.
  • stage EC2에 SSM RunCommand를 보내 solid-connection-dev 컨테이너를 dev,loadtest 프로필로 재기동합니다.
  • prod EC2에 SSM RunCommand를 보내 prod RDS dump를 만들고 loadtest RDS에 restore합니다.
  • 임시 migration Parameter Store 값을 삭제합니다.

종료

  1. GitHub에서 Actions > Load Test Stop을 엽니다.
  2. Run workflow를 클릭합니다.
  3. 기본값 그대로 실행합니다.

입력값:

  • restore_stage_dev: stage 앱을 기존 dev compose 구성으로 되돌립니다. 기본값 true
  • destroy_rds: loadtest Terraform stack을 destroy합니다. 기본값 true

Stop workflow가 수행하는 작업:

  • AWS_ROLE_ARN을 OIDC로 assume합니다.
  • GH_PAT로 secret submodule을 checkout합니다.
  • stage EC2에 SSM RunCommand를 보내 docker-compose.loadtest.override.yml을 제거하고 기존 dev compose 구성으로 재기동합니다.
  • environment/load_test에서 Terraform destroy를 실행해 loadtest RDS 관련 리소스를 제거합니다.

k6 파일

  • config/load-test/k6에 monitor repo의 k6 파일을 포함했습니다.
  • stage EC2를 새로 생성하면 Terraform cloud-init이 /home/ubuntu/solid-connection-load-test/k6에 해당 파일을 배치합니다.
  • 기존 stage EC2는 재생성하지 않으므로 이 cloud-init 변경이 즉시 반영되지는 않습니다.

포함 파일 역할:

  • whole-user-flow.js: https://api.stage.solid-connection.com을 대상으로 로그인부터 대학 조회, 게시글/댓글 생성과 수정/삭제, 지원 생성, 경쟁자 조회까지 전체 사용자 플로우를 실행합니다.
  • createPost.json: 게시글 생성 API multipart 요청에 첨부되는 JSON body입니다.
  • updatePost.json: 게시글 수정 API multipart 요청에 첨부되는 JSON body입니다.
  • set_up_xk6.sh: Go와 xk6를 설치하고 Prometheus remote-write output을 포함한 k6 바이너리를 빌드합니다.
  • script/set-load-test.sh: 기존 로컬 DB 기반 부하 테스트 세팅용 스크립트입니다. 이번 RDS 기반 workflow의 기본 실행 경로에는 포함하지 않습니다.

k6 실행 방법:

  1. Load Test Start workflow를 실행해서 stage 앱이 loadtest RDS를 바라보도록 전환합니다.
  2. stage 서버에서 k6 디렉터리로 이동합니다.
cd /home/ubuntu/solid-connection-load-test/k6
  1. 최초 1회 xk6 기반 k6 바이너리를 빌드합니다.
./set_up_xk6.sh
source ~/.bashrc
  1. 전체 사용자 플로우 부하 테스트를 실행합니다.
./k6 run whole-user-flow.js

현재 whole-user-flow.js 기본 설정:

  • target: https://api.stage.solid-connection.com
  • scenario: per-vu-iterations
  • VU: 10
  • iterations: VU당 10
  • maxDuration: 15m
  • login user: user${__VU}@example.com / password
  • result tags: testid=whole-user-flow, time=<KST 기준 실행 시각>

특이 사항

  • terraform applyterraform destroy는 각각 Load Test Start, Load Test Stop workflow 내부에서 실행됩니다.
  • stage 서버는 부하 테스트 중 loadtest RDS를 바라보도록 재기동되므로, 해당 시간에는 일반 stage 검증과 겹치지 않게 조율이 필요합니다.
  • loadtest RDS는 public 접근을 허용하지 않고 prod/stage EC2 보안 그룹에서만 MySQL 접근을 허용합니다.
  • AWS_ROLE_ARN에 필요한 SSM RunCommand/Parameter Store 권한은 정책에 추가했습니다.

리뷰 요구사항 (선택)

  • loadtest RDS 생성/삭제 범위가 비용과 운영 요구사항에 맞는지 확인해 주세요.
  • prod 데이터 dump/restore를 prod EC2에서 수행하는 방식이 운영 정책에 맞는지 확인해 주세요.
  • stage 서버를 dev,loadtest 프로필로 전환하는 방식이 서버 설정 우선순위와 맞는지 확인해 주세요.
  • stage 신규 생성 시 k6 파일을 cloud-init으로 배치하는 방식이 적절한지 확인해 주세요.
  • k6 실행을 stage 서버에서 수동으로 유지할지, 별도 GitHub Actions/SSM 실행으로 확장할지 확인해 주세요.

Summary by CodeRabbit

릴리스 노트

  • Chores

    • 부하 테스트 자동화 인프라 구축 완료
    • 테스트 환경용 데이터베이스 및 자동화 스크립트 추가
  • Documentation

    • 부하 테스트 자동화 가이드 문서 추가

lsy1307 added 3 commits May 6, 2026 01:07
- 상세내용: 부하 테스트 실행에 필요한 secret submodule 변경 커밋을 상위 인프라 저장소에 반영
- 상세내용: 부하 테스트용 RDS, 보안 그룹, SSM datasource 파라미터를 Terraform으로 정의

- 상세내용: prod/stage EC2 보안 그룹에서 loadtest RDS 3306 접근을 허용하도록 구성
- 상세내용: start.sh에서 RDS 생성, stage 전환, prod 데이터 복사를 자동화

- 상세내용: stop.sh에서 stage 원복과 loadtest RDS destroy 흐름을 제공

- 상세내용: Windows와 macOS/Linux 실행 환경에서 사용할 bash 기반 절차를 문서화
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 961ce255-a93f-4783-a0b5-a5de6b4375ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 19-feat-loadtest-rds-parameter-store

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

lsy1307 added 2 commits May 6, 2026 01:47
- 상세내용: workflow_dispatch로 부하 테스트 시작과 종료를 수동 실행할 수 있도록 워크플로우 추가

- 상세내용: stage 서버 전환과 원복을 SSH 대신 SSM RunCommand로 수행하도록 변경

- 상세내용: SSH key 입력 없이 OIDC 기반 AWS Role과 GH_PAT submodule checkout 흐름을 사용하도록 문서화
- 상세내용: monitor repo의 k6 파일을 infra repo에 포함해 stage EC2 cloud-init에서 배치하도록 구성

- 상세내용: app_stack module에 k6 파일 배치 옵션을 추가하고 stage 환경에서만 활성화

- 상세내용: 부하 테스트 README를 한글로 변경하고 GitHub Actions 실행 흐름을 정리
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Terraform Plan: stage

No changes. Your infrastructure matches the configuration.

전체 plan 결과는 보안을 위해 댓글에 포함되지 않습니다. 워크플로우 실행 아티팩트를 확인하세요.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Terraform Plan: prod

No changes. Your infrastructure matches the configuration.

전체 plan 결과는 보안을 위해 댓글에 포함되지 않습니다. 워크플로우 실행 아티팩트를 확인하세요.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

@coderabbitai review

Copy link
Copy Markdown
Contributor

@Hexeong Hexeong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

고생하셨습니다! 궁금한 점 질문드립니다.

  1. 부하테스트 환경을 깃헙 액션으로 생성하고, 다른 깃헙 액션으로 부하테스트를 실행하는 것으로 이해했습니다! 부하테스트를 진행할때 부하를 생성하는 깃헙 러너도 사양이 좋아야 vuser에 대한 설정이 잘 반영되는 것으로 알고 있는데 해당 러너의 사양이 충분한지 궁금합니다!

  2. 두번째로는 현재 보이는 양상으로는 부하테스트를 진행할 때, updatePost.json 과 같은 입력값을 파일로써 넘겨주어 실행하는 방식으로 보이는데 깃헙 액션을 실행할 때 개발자가 파라미터를 입력해서 실행할 수 있는 방법은 없을까요? 만약 이렇게 된다면 좀 더 유연한 부하테스트 실행이 될 것 같습니다!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
environment/load_test/main.tf (1)

1-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

environment/load_test 환경이 Terraform 자동 검증 파이프라인에서 누락되었습니다

terraform-plan.ymldetect-changes job (lines 17-57)에 load_test 환경이 필터로 정의되지 않았고, 대응하는 plan-load_test job도 없습니다. 그 결과 PR에서 environment/load_test/*.tf 변경이 발생했지만 자동 계획 검증이 실행되지 않았으며, 예상치 못한 리소스 파괴/대체 여부를 검증할 수 없습니다.

필수 조치:

  • .github/workflows/terraform-plan.ymlload_test 필터와 plan-load_test job을 추가하여 environment/load_test/** 변경을 감지하도록 구성해야 합니다.
  • 코딩 가이드라인 **/*.tf: "PR 댓글에 올라온 각 환경의 'Terraform Plan' 결과를 반드시 확인"에 따라 load_test plan 결과가 PR 코멘트에 포함되어야 합니다.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@environment/load_test/main.tf` around lines 1 - 145, The detect-changes
workflow is missing the load_test environment so changes under
environment/load_test/** are not caught; update the detect-changes job (the job
named detect-changes) to include a path filter for environment/load_test/** and
add a corresponding plan-load_test job (modeled after existing plan-* jobs) that
runs the Terraform init/plan for the load_test workspace and posts plan output
to the PR; ensure the job name is plan-load_test and it references the same
steps/variables (workspace, backend config, ssm/kms variables) used by other
plan jobs so the new job is executed when files in environment/load_test/**
change and its plan gets commented on the PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/load-test/k6/set_up_xk6.sh`:
- Around line 15-25: The Prometheus remote write URL and trend-stats export are
inconsistent between the current shell and the lines being appended to the login
shell; update set_up_xk6.sh so the echoed ~/.bashrc lines use the same
K6_PROMETHEUS_RW_SERVER_URL value as the current shell (use the existing
K6_PROMETHEUS_RW_SERVER_URL variable rather than hardcoding a different IP) and
add the missing export for K6_PROMETHEUS_RW_TREND_STATS (export
K6_PROMETHEUS_RW_TREND_STATS="p(90),p(95),p(99),avg,min,max") so child processes
like k6 receive the setting.

In `@config/load-test/k6/whole-user-flow.js`:
- Around line 333-366: searchUniversities(), getLanguageTests(), and getGPAs()
may return empty or unexpected responses and the code immediately dereferences
ids (e.g., uniList[Math...].id, langList[0].id, gpaList[0].id), causing runtime
TypeError; modify the flow in whole-user-flow.js to validate the HTTP response
and parsed body before indexing: call .json() then check that the arrays
(uniList, languageTestScoreStatusResponseList, gpaScoreStatusResponseList) exist
and have length > 0, and if not call k6.fail() (or otherwise record a clear
failure) with a descriptive message including the function name and any
status/error info; update the calls around searchUniversities, getLanguageTests,
and getGPAs to perform these checks and only proceed to extract .id when
present.

In `@environment/load_test/main.tf`:
- Around line 1-10: The data sources data.aws_vpc.default and
data.aws_subnets.default must not rely on default = true; instead select the VPC
and its subnets by the same criteria as your stage/prod instances (e.g., filter
by the environment tag, or derive vpc_id from a representative
EC2/data.aws_instance used by stage/prod) so the load-test DB ends up in the
same VPC; change data "aws_vpc" "default" to a filtered lookup (remove
default=true and add filters like tag:Environment or id =
data.aws_instance.<name>.vpc_id) and update data "aws_subnets" "default" to use
values = [data.aws_vpc.selected.id]; apply the same pattern for the other
occurrences noted.
- Around line 12-35: The Terraform changes under environment/load_test are not
included in the PR auto-validate because terraform-plan.yml's detect-changes
filter omits that path; update terraform-plan.yml to include
"environment/load_test/**" in the detect-changes paths so changes to data
"aws_instance" "prod_api" and data "aws_instance" "stage_api" trigger plan runs,
or alternatively modify the load_test Terraform to avoid ambiguous name-based
lookups by accepting instance IDs as variables and replacing the tag-based data
sources with direct aws_instance lookups by ID to prevent apply-time failures
when multiple instances share the same Name tag.

In `@environment/stage/main.tf`:
- Line 45: 현재 enable_k6_files = true 만 설정하면 cloud-init user data 변경이 기존 stage
EC2에 반영되지 않습니다; locate modules/app_stack/ec2.tf and the aws_instance.api_server
resource which currently has user_data_replace_on_change = false and
lifecycle.ignore_changes that includes user_data, and either (A) set
user_data_replace_on_change = true and remove user_data from
lifecycle.ignore_changes so the instance will be recreated/updated with the k6
files, or (B) keep instance untouched and add an explicit file/SSM sync step to
copy files into /home/ubuntu/solid-connection-load-test/k6 (or document that
instance recreation is required) depending on whether you want automatic
redeploy or an out-of-band deployment.

In `@scripts/load_test/README.md`:
- Around line 33-36: 문서 문구가 로컬 실행으로 읽히므로 "environment/load_test에서 terraform
apply" 문장을 GitHub Actions가 실행함을 명시하도록 수정하세요: README의 해당 항목(현재 "1.
`environment/load_test`에서 `terraform apply`를 실행합니다.")을 "GitHub Actions가
`environment/load_test`에서 `terraform apply`를 실행합니다."로 바꾸고, 필요하면 한 줄로 '환경/*.tf
파일은 로컬에서 apply 금지, GitHub Actions로만 실행'이라는 규칙도 추가해 규정(환경 terraform 적용은 GitHub
Actions 전용)을 명확히 하세요.
- Around line 52-63: Update the deployment docs and automation so stage EC2
always has the k6 assets: either modify the Start workflow to run an SSM step
that syncs the repo k6 directory into /home/ubuntu/solid-connection-load-test/k6
(copy the files listed: createPost.json, updatePost.json, whole-user-flow.js,
set_up_xk6.sh, script/set-load-test.sh), or change the Actions/SSM job to
perform a repository checkout on the target and run k6 from that checked-out
path; update README.md to document which of these two approaches is implemented
and reference the cloud-init path `/home/ubuntu/solid-connection-load-test/k6`
and the setup scripts so reviewers can locate the change.
- Line 42: 현재 README 단계(SSM RunCommand로 prod EC2에서 `mysqldump` 실행 후 loadtest RDS
복원)는 전체 운영 DB를 그대로 복제하므로 개인정보 유출 리스크가 큽니다; 대신 `mysqldump` 호출을 전체 덤프로 유지하지 말고 데이터
마스킹/익명화 스크립트 또는 테이블/컬럼 필터링(필요한 테이블 subset만 덤프)으로 덤프를 생성하도록 변경하고, 복원 전 검증 단계에서 민감
필드(예: 사용자 식별자, 이메일, 전화번호 등)가 제거되었는지 확인하도록 자동화하세요; 또한 덤프 파일의 보존 기간과 자동 삭제(예: S3
수명주기나 EC2에서의 자동 삭제 스크립트)를 README의 절차와 SSM RunCommand 명세에 명시해 책임자를 고정하고 검증 로그를
남기도록 구성하십시오.

In `@scripts/load_test/start.sh`:
- Around line 228-232: The dump file DUMP_FILE can be left on /tmp if a later
command fails; after creating DUMP_FILE in the remote shell session (right after
the mysqldump command that sets DUMP_FILE), register a shell EXIT trap such as
trap 'rm -f "$DUMP_FILE"' EXIT so the temporary gzip file is removed on any exit
(success or failure); ensure the trap is set inside the same remote shell
context that creates and consumes DUMP_FILE and that the final explicit rm -f
"$DUMP_FILE" remains (the trap will be a safety net for error paths).
- Around line 6-11: The script currently hardcodes
DATABASE_NAME="solid_connection" (and similar hardcoded username/password param
names) which can drift from Terraform; update start.sh to fetch the DB name and
related parameters from Terraform outputs instead of hardcoding: call terraform
output (or read the exported load_test_db_name output) to set DATABASE_NAME and
use the corresponding Terraform outputs for LOADTEST_DB_USERNAME_PARAMETER and
LOADTEST_DB_PASSWORD_PARAMETER (and the prod equivalents) so the variables used
in the dump/restore logic (referencing DATABASE_NAME,
LOADTEST_DB_USERNAME_PARAMETER, LOADTEST_DB_PASSWORD_PARAMETER,
PROD_DB_USERNAME_PARAMETER, PROD_DB_PASSWORD_PARAMETER) always reflect the
current tf outputs.
- Around line 98-119: The SSM polling loop using status, command_id, and
instance_id has no overall timeout and can hang indefinitely; modify the loop to
enforce a maximum wait by adding either a max_attempts counter or
start_time/timeout check, incrementing attempts (or checking elapsed seconds)
each iteration, and if exceeded print the final get-command-invocation JSON for
command_id/instance_id and exit non‑zero; ensure the existing case branches
remain but replace the infinite while true with a bounded loop or a timeout
condition so Pending|InProgress|Delayed eventually aborts and returns the last
invocation result.
- Around line 142-166: The stage-switch block guarded by
SWITCH_STAGE_TO_LOADTEST currently runs before the SKIP_DATA_COPY block, causing
the stage app to restart to dev,loadtest and hit an incomplete/empty DB during
prod dump/restore; move the entire SWITCH_STAGE_TO_LOADTEST conditional (the
commands building stage_commands_json and the call to send_ssm_command that runs
docker compose up -d solid-connection-dev) to after the SKIP_DATA_COPY/data-copy
and restore logic (or alternatively ensure the stage remains down until restore
completes by issuing a docker compose down in that block and only bringing it up
after restore completion); update references to SWITCH_STAGE_TO_LOADTEST,
send_ssm_command, and the docker compose up/down commands accordingly so stage
is only started once data copy/restore finishes.

In `@scripts/load_test/stop.sh`:
- Around line 68-89: The polling loop in send_ssm_command() (the while true that
checks status for command_id and instance_id) lacks a timeout and can hang
indefinitely; add a configurable max wait (e.g., MAX_WAIT_SECONDS or
MAX_ITERATIONS) and track elapsed time or loop counts inside the loop, break and
treat as failure when exceeded, and on timeout call aws ssm
get-command-invocation for diagnostics and exit 1 with a clear message including
the timeout, command_id and instance_id.

---

Outside diff comments:
In `@environment/load_test/main.tf`:
- Around line 1-145: The detect-changes workflow is missing the load_test
environment so changes under environment/load_test/** are not caught; update the
detect-changes job (the job named detect-changes) to include a path filter for
environment/load_test/** and add a corresponding plan-load_test job (modeled
after existing plan-* jobs) that runs the Terraform init/plan for the load_test
workspace and posts plan output to the PR; ensure the job name is plan-load_test
and it references the same steps/variables (workspace, backend config, ssm/kms
variables) used by other plan jobs so the new job is executed when files in
environment/load_test/** change and its plan gets commented on the PR.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e653fd43-eafa-4c3d-ad1e-050891eed30e

📥 Commits

Reviewing files that changed from the base of the PR and between f85e08f and 9ef254a.

📒 Files selected for processing (18)
  • .github/workflows/load-test-start.yml
  • .github/workflows/load-test-stop.yml
  • config/load-test/k6/createPost.json
  • config/load-test/k6/script/set-load-test.sh
  • config/load-test/k6/set_up_xk6.sh
  • config/load-test/k6/updatePost.json
  • config/load-test/k6/whole-user-flow.js
  • config/secrets
  • environment/load_test/main.tf
  • environment/load_test/output.tf
  • environment/load_test/provider.tf
  • environment/load_test/variables.tf
  • environment/stage/main.tf
  • modules/app_stack/ec2.tf
  • modules/app_stack/variables.tf
  • scripts/load_test/README.md
  • scripts/load_test/start.sh
  • scripts/load_test/stop.sh

Comment thread config/load-test/k6/set_up_xk6.sh Outdated
Comment on lines +15 to +25
export K6_PROMETHEUS_RW_SERVER_URL=http://132.145.83.182:9090/api/v1/write
export K6_PROMETHEUS_RW_TREND_STATS="p(90),p(95),p(99),avg,min,max"
{
echo "export BASE_DIR=${BASE_DIR}"
echo "export GOROOT=${GOROOT}"
echo "export GOPATH=${GOPATH}"
echo "export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin"
echo "export XK6_BIN=${GOPATH}/bin/xk6"
echo "export K6_OUT=xk6-prometheus-rw"
echo "export K6_PROMETHEUS_RW_SERVER_URL=http://146.56.46.8:9090/api/v1/write"
echo "K6_PROMETHEUS_RW_TREND_STATS=\"p(90),p(95),p(99),avg,min,max\""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the file
find . -name "set_up_xk6.sh" -type f

Repository: solid-connection/solid-connection-infra

Length of output: 119


🏁 Script executed:

# Read the file to verify the issues
cat -n ./config/load-test/k6/set_up_xk6.sh

Repository: solid-connection/solid-connection-infra

Length of output: 2023


Prometheus 원격 쓰기 설정이 현재 셸과 로그인 셸에서 다릅니다.

두 가지 문제가 있습니다:

  1. IP 주소 불일치: 현재 셸에서는 132.145.83.182를 사용하지만(15줄), ~/.bashrc에 추가되는 설정은 146.56.46.8로 하드코딩되어 있습니다(24줄). 재로그인 후 대상이 바뀝니다.

  2. 누락된 export: K6_PROMETHEUS_RW_TREND_STATS는 25줄에서 export 키워드가 없어서 자식 프로세스인 k6에 전달되지 않습니다.

🔧 제안 수정
 export K6_PROMETHEUS_RW_SERVER_URL=http://132.145.83.182:9090/api/v1/write
 export K6_PROMETHEUS_RW_TREND_STATS="p(90),p(95),p(99),avg,min,max"
 {
   echo "export BASE_DIR=${BASE_DIR}"
   echo "export GOROOT=${GOROOT}"
   echo "export GOPATH=${GOPATH}"
   echo "export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin"
   echo "export XK6_BIN=${GOPATH}/bin/xk6"
   echo "export K6_OUT=xk6-prometheus-rw"
-  echo "export K6_PROMETHEUS_RW_SERVER_URL=http://146.56.46.8:9090/api/v1/write"
-  echo "K6_PROMETHEUS_RW_TREND_STATS=\"p(90),p(95),p(99),avg,min,max\""
+  echo "export K6_PROMETHEUS_RW_SERVER_URL=${K6_PROMETHEUS_RW_SERVER_URL}"
+  echo "export K6_PROMETHEUS_RW_TREND_STATS=\"${K6_PROMETHEUS_RW_TREND_STATS}\""
 } >> ~/.bashrc
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/load-test/k6/set_up_xk6.sh` around lines 15 - 25, The Prometheus
remote write URL and trend-stats export are inconsistent between the current
shell and the lines being appended to the login shell; update set_up_xk6.sh so
the echoed ~/.bashrc lines use the same K6_PROMETHEUS_RW_SERVER_URL value as the
current shell (use the existing K6_PROMETHEUS_RW_SERVER_URL variable rather than
hardcoding a different IP) and add the missing export for
K6_PROMETHEUS_RW_TREND_STATS (export
K6_PROMETHEUS_RW_TREND_STATS="p(90),p(95),p(99),avg,min,max") so child processes
like k6 receive the setting.

Comment thread config/load-test/k6/whole-user-flow.js Outdated
Comment on lines +333 to +366
const uniSearchRes = searchUniversities(''); // 이번학기 열린 대학 중 랜덤하게 id 가져오기
const uniList = uniSearchRes.json();
const universityId = uniList[Math.floor(Math.random() * uniList.length)].id;

likeUniversity(universityId, auth);
isLikedUniversity(universityId, auth);
getLikedUniversities(auth);
cancelLikeUniversity(universityId, auth);
getDetailedUniversityInfo(universityId);

getMyInfo(auth);

getBoards(auth);
getPostsByBoard('FREE', auth);

const postId = createPost(token);
updatePost(postId, token);
getPostDetail(postId, auth);
likePost(postId, auth);
cancelLikePost(postId, auth);

const commentId = createComment(postId, auth);
updateComment(commentId, auth);
deleteComment(commentId, auth);

deletePost(postId, auth);

const langRes = getLanguageTests(auth);
const langList = langRes.json().languageTestScoreStatusResponseList;
const languageTestScoreId = langList[0].id;

const gpaRes = getGPAs(auth);
const gpaList = gpaRes.json().gpaScoreStatusResponseList;
const gpaScoreId = gpaList[0].id;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

빈 응답에서 바로 id를 역참조하고 있습니다.

searchUniversities(), getLanguageTests(), getGPAs()가 빈 배열이나 예상과 다른 응답을 주면 여기서 바로 TypeError로 종료됩니다. 자동화된 부하 테스트라면 런타임 예외 대신 명시적으로 fail() 처리해서 실패 원인을 남기는 편이 안전합니다.

제안 수정안
     const uniSearchRes = searchUniversities(''); // 이번학기 열린 대학 중 랜덤하게 id 가져오기
     const uniList = uniSearchRes.json();
+    if (!Array.isArray(uniList) || uniList.length === 0) {
+        fail('searchUniversities returned no universities');
+    }
     const universityId = uniList[Math.floor(Math.random() * uniList.length)].id;
@@
     const langRes = getLanguageTests(auth);
     const langList = langRes.json().languageTestScoreStatusResponseList;
+    if (!Array.isArray(langList) || langList.length === 0) {
+        fail('getLanguageTests returned no scores');
+    }
     const languageTestScoreId = langList[0].id;
@@
     const gpaRes = getGPAs(auth);
     const gpaList = gpaRes.json().gpaScoreStatusResponseList;
+    if (!Array.isArray(gpaList) || gpaList.length === 0) {
+        fail('getGPAs returned no scores');
+    }
     const gpaScoreId = gpaList[0].id;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const uniSearchRes = searchUniversities(''); // 이번학기 열린 대학 중 랜덤하게 id 가져오기
const uniList = uniSearchRes.json();
const universityId = uniList[Math.floor(Math.random() * uniList.length)].id;
likeUniversity(universityId, auth);
isLikedUniversity(universityId, auth);
getLikedUniversities(auth);
cancelLikeUniversity(universityId, auth);
getDetailedUniversityInfo(universityId);
getMyInfo(auth);
getBoards(auth);
getPostsByBoard('FREE', auth);
const postId = createPost(token);
updatePost(postId, token);
getPostDetail(postId, auth);
likePost(postId, auth);
cancelLikePost(postId, auth);
const commentId = createComment(postId, auth);
updateComment(commentId, auth);
deleteComment(commentId, auth);
deletePost(postId, auth);
const langRes = getLanguageTests(auth);
const langList = langRes.json().languageTestScoreStatusResponseList;
const languageTestScoreId = langList[0].id;
const gpaRes = getGPAs(auth);
const gpaList = gpaRes.json().gpaScoreStatusResponseList;
const gpaScoreId = gpaList[0].id;
const uniSearchRes = searchUniversities(''); // 이번학기 열린 대학 중 랜덤하게 id 가져오기
const uniList = uniSearchRes.json();
if (!Array.isArray(uniList) || uniList.length === 0) {
fail('searchUniversities returned no universities');
}
const universityId = uniList[Math.floor(Math.random() * uniList.length)].id;
likeUniversity(universityId, auth);
isLikedUniversity(universityId, auth);
getLikedUniversities(auth);
cancelLikeUniversity(universityId, auth);
getDetailedUniversityInfo(universityId);
getMyInfo(auth);
getBoards(auth);
getPostsByBoard('FREE', auth);
const postId = createPost(token);
updatePost(postId, token);
getPostDetail(postId, auth);
likePost(postId, auth);
cancelLikePost(postId, auth);
const commentId = createComment(postId, auth);
updateComment(commentId, auth);
deleteComment(commentId, auth);
deletePost(postId, auth);
const langRes = getLanguageTests(auth);
const langList = langRes.json().languageTestScoreStatusResponseList;
if (!Array.isArray(langList) || langList.length === 0) {
fail('getLanguageTests returned no scores');
}
const languageTestScoreId = langList[0].id;
const gpaRes = getGPAs(auth);
const gpaList = gpaRes.json().gpaScoreStatusResponseList;
if (!Array.isArray(gpaList) || gpaList.length === 0) {
fail('getGPAs returned no scores');
}
const gpaScoreId = gpaList[0].id;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/load-test/k6/whole-user-flow.js` around lines 333 - 366,
searchUniversities(), getLanguageTests(), and getGPAs() may return empty or
unexpected responses and the code immediately dereferences ids (e.g.,
uniList[Math...].id, langList[0].id, gpaList[0].id), causing runtime TypeError;
modify the flow in whole-user-flow.js to validate the HTTP response and parsed
body before indexing: call .json() then check that the arrays (uniList,
languageTestScoreStatusResponseList, gpaScoreStatusResponseList) exist and have
length > 0, and if not call k6.fail() (or otherwise record a clear failure) with
a descriptive message including the function name and any status/error info;
update the calls around searchUniversities, getLanguageTests, and getGPAs to
perform these checks and only proceed to extract .id when present.

Comment thread environment/load_test/main.tf Outdated
Comment on lines +1 to +10
data "aws_vpc" "default" {
default = true
}

data "aws_subnets" "default" {
filter {
name = "vpc-id"
values = [data.aws_vpc.default.id]
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

기본 VPC 고정은 환경 불일치 시 배포 실패/접속 불가를 유발할 수 있습니다

default = true로 VPC/서브넷을 고정하면, prod/stage EC2가 다른 VPC에 있을 때 source_security_group_id 기반 ingress 규칙/DB 접근 경로가 깨질 수 있습니다. 로드테스트 DB VPC는 최소한 stage/prod 인스턴스의 VPC와 동일한 기준으로 선택하는 편이 안전합니다.

수정 예시
-data "aws_vpc" "default" {
-  default = true
-}
-
-data "aws_subnets" "default" {
+data "aws_subnets" "target" {
   filter {
     name   = "vpc-id"
-    values = [data.aws_vpc.default.id]
+    values = [data.aws_instance.stage_api.vpc_id]
   }
 }
...
 resource "aws_security_group" "load_test_db" {
   name        = "sc-load-test-db-sg"
   description = "Security group for load test RDS"
-  vpc_id      = data.aws_vpc.default.id
+  vpc_id      = data.aws_instance.stage_api.vpc_id
 }
...
 resource "aws_db_subnet_group" "load_test" {
   name       = "sc-load-test-db-subnet-group"
-  subnet_ids = data.aws_subnets.default.ids
+  subnet_ids = data.aws_subnets.target.ids
 }

Also applies to: 59-63, 88-90

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@environment/load_test/main.tf` around lines 1 - 10, The data sources
data.aws_vpc.default and data.aws_subnets.default must not rely on default =
true; instead select the VPC and its subnets by the same criteria as your
stage/prod instances (e.g., filter by the environment tag, or derive vpc_id from
a representative EC2/data.aws_instance used by stage/prod) so the load-test DB
ends up in the same VPC; change data "aws_vpc" "default" to a filtered lookup
(remove default=true and add filters like tag:Environment or id =
data.aws_instance.<name>.vpc_id) and update data "aws_subnets" "default" to use
values = [data.aws_vpc.selected.id]; apply the same pattern for the other
occurrences noted.

Comment thread environment/load_test/main.tf
Comment thread environment/stage/main.tf Outdated
redis_exporter_version = var.redis_exporter_version
alloy_version = var.alloy_version

enable_k6_files = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

이 플래그만 켜서는 현재 stage EC2에 k6 파일이 배포되지 않습니다.

enable_k6_files = true는 cloud-init user data 쪽만 바꾸는데, modules/app_stack/ec2.tfaws_instance.api_serveruser_data_replace_on_change = false이고 lifecycle.ignore_changes에 user data까지 포함되어 있어서 기존 stage 인스턴스에는 반영되지 않습니다. 이번 PR 설명대로 stage EC2를 재생성하지 않는다면 /home/ubuntu/solid-connection-load-test/k6가 비어 있을 가능성이 높습니다. 현재 stage 서버를 k6 실행 호스트로 쓸 계획이면 별도 SSM/file sync 단계나 의도적인 instance recreation이 필요합니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@environment/stage/main.tf` at line 45, 현재 enable_k6_files = true 만 설정하면
cloud-init user data 변경이 기존 stage EC2에 반영되지 않습니다; locate
modules/app_stack/ec2.tf and the aws_instance.api_server resource which
currently has user_data_replace_on_change = false and lifecycle.ignore_changes
that includes user_data, and either (A) set user_data_replace_on_change = true
and remove user_data from lifecycle.ignore_changes so the instance will be
recreated/updated with the k6 files, or (B) keep instance untouched and add an
explicit file/SSM sync step to copy files into
/home/ubuntu/solid-connection-load-test/k6 (or document that instance recreation
is required) depending on whether you want automatic redeploy or an out-of-band
deployment.

Comment thread scripts/load_test/start.sh Outdated
Comment on lines +6 to +11
DATABASE_NAME="solid_connection"
MIGRATION_PARAMETER_PREFIX="/solid-connection/loadtest/migration"
PROD_DB_USERNAME_PARAMETER="/solid-connection/prod/spring.datasource.username"
PROD_DB_PASSWORD_PARAMETER="/solid-connection/prod/spring.datasource.password"
LOADTEST_DB_USERNAME_PARAMETER="/solid-connection/loadtest/spring.datasource.username"
LOADTEST_DB_PASSWORD_PARAMETER="/solid-connection/loadtest/spring.datasource.password"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

DB 이름을 하드코딩하면 Terraform 설정과 쉽게 드리프트 납니다.

이 스크립트는 항상 solid_connection만 dump/restore하지만, 같은 PR에서 이미 load_test_db_name output을 따로 내보내고 있습니다. tfvars에서 db_name이 바뀌면 잘못된 스키마를 drop/create 하거나 복원이 실패할 수 있으니, 여기서는 하드코딩 대신 Terraform output을 읽는 쪽이 맞습니다.

제안 수정안
-DATABASE_NAME="solid_connection"
+DATABASE_NAME=""
@@
 prod_endpoint="$(tf_output prod_rds_endpoint)"
 prod_port="$(tf_output prod_rds_port)"
 loadtest_endpoint="$(tf_output load_test_rds_endpoint)"
 loadtest_port="$(tf_output load_test_rds_port)"
+if [[ -z "$DATABASE_NAME" ]]; then
+  DATABASE_NAME="$(tf_output load_test_db_name)"
+fi

Also applies to: 137-140, 212-231

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/load_test/start.sh` around lines 6 - 11, The script currently
hardcodes DATABASE_NAME="solid_connection" (and similar hardcoded
username/password param names) which can drift from Terraform; update start.sh
to fetch the DB name and related parameters from Terraform outputs instead of
hardcoding: call terraform output (or read the exported load_test_db_name
output) to set DATABASE_NAME and use the corresponding Terraform outputs for
LOADTEST_DB_USERNAME_PARAMETER and LOADTEST_DB_PASSWORD_PARAMETER (and the prod
equivalents) so the variables used in the dump/restore logic (referencing
DATABASE_NAME, LOADTEST_DB_USERNAME_PARAMETER, LOADTEST_DB_PASSWORD_PARAMETER,
PROD_DB_USERNAME_PARAMETER, PROD_DB_PASSWORD_PARAMETER) always reflect the
current tf outputs.

Comment on lines +98 to +119
local status
while true; do
sleep 5
status="$(aws ssm get-command-invocation \
--command-id "$command_id" \
--instance-id "$instance_id" \
--query "Status" \
--output text 2>/dev/null || true)"

case "$status" in
Pending|InProgress|Delayed|"") continue ;;
Success) break ;;
*)
aws ssm get-command-invocation \
--command-id "$command_id" \
--instance-id "$instance_id" \
--output json || true
echo "SSM command failed with status $status: $comment" >&2
exit 1
;;
esac
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SSM polling에 종료 조건이 없습니다.

Pending|InProgress|Delayed 상태가 계속되면 이 루프는 끝나지 않습니다. RunCommand 하나만 매달려도 전체 워크플로가 GitHub job timeout까지 붙잡히므로, 최대 대기 시간이나 재시도 횟수를 두고 timeout 시 마지막 invocation 결과를 출력하는 쪽이 안전합니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/load_test/start.sh` around lines 98 - 119, The SSM polling loop using
status, command_id, and instance_id has no overall timeout and can hang
indefinitely; modify the loop to enforce a maximum wait by adding either a
max_attempts counter or start_time/timeout check, incrementing attempts (or
checking elapsed seconds) each iteration, and if exceeded print the final
get-command-invocation JSON for command_id/instance_id and exit non‑zero; ensure
the existing case branches remain but replace the infinite while true with a
bounded loop or a timeout condition so Pending|InProgress|Delayed eventually
aborts and returns the last invocation result.

Comment thread scripts/load_test/start.sh Outdated
Comment on lines +142 to +166
if [[ "$SWITCH_STAGE_TO_LOADTEST" == "true" ]]; then
stage_commands_json="$(jq -cn \
--arg app_dir "$STAGE_APP_DIR" \
--arg compose_file "$STAGE_COMPOSE_FILE" \
'{
commands: [
"set -euo pipefail",
"cd \($app_dir)",
"CURRENT_IMAGE=$(docker inspect -f '\''{{.Config.Image}}'\'' solid-connection-dev 2>/dev/null || true)",
"if [ -z \"$CURRENT_IMAGE\" ]; then echo \"solid-connection-dev container is not running; cannot infer image tag\" >&2; exit 1; fi",
"OWNER_LOWERCASE=$(echo \"$CURRENT_IMAGE\" | sed -E '\''s#^ghcr.io/([^/]+)/.*#\\1#'\'')",
"IMAGE_TAG=$(echo \"$CURRENT_IMAGE\" | sed -E '\''s#.*:([^:]+)$#\\1#'\'')",
"cat > docker-compose.loadtest.override.yml <<'\''YAML'\''\nservices:\n solid-connection-dev:\n environment:\n - SPRING_PROFILES_ACTIVE=dev,loadtest\n - AWS_REGION=ap-northeast-2\n - SPRING_DATA_REDIS_HOST=127.0.0.1\n - SPRING_DATA_REDIS_PORT=6379\nYAML",
"docker compose -f \($compose_file) -f docker-compose.loadtest.override.yml down || true",
"OWNER_LOWERCASE=\"$OWNER_LOWERCASE\" IMAGE_TAG=\"$IMAGE_TAG\" docker compose -f \($compose_file) -f docker-compose.loadtest.override.yml up -d solid-connection-dev"
]
}')"

send_ssm_command "$stage_instance_id" "Switch stage app to load test datasource" "$stage_commands_json"
fi

if [[ "$SKIP_DATA_COPY" != "true" ]]; then
trap delete_temp_parameters EXIT

prod_db_username="$(aws ssm get-parameter \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

stage 전환이 데이터 복사보다 앞서 있어서 중간 상태 DB를 바라보게 됩니다.

지금 순서면 stage 앱이 dev,loadtest로 먼저 재기동된 뒤에 prod dump/restore가 시작됩니다. 그 사이 stage는 비어 있거나 복원 중인 DB를 보게 되므로 헬스체크 실패나 불안정한 응답이 발생할 수 있습니다. 데이터 복사를 먼저 끝내고 전환하거나, 최소한 복원 완료 전까지는 stage를 내려둬야 합니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/load_test/start.sh` around lines 142 - 166, The stage-switch block
guarded by SWITCH_STAGE_TO_LOADTEST currently runs before the SKIP_DATA_COPY
block, causing the stage app to restart to dev,loadtest and hit an
incomplete/empty DB during prod dump/restore; move the entire
SWITCH_STAGE_TO_LOADTEST conditional (the commands building stage_commands_json
and the call to send_ssm_command that runs docker compose up -d
solid-connection-dev) to after the SKIP_DATA_COPY/data-copy and restore logic
(or alternatively ensure the stage remains down until restore completes by
issuing a docker compose down in that block and only bringing it up after
restore completion); update references to SWITCH_STAGE_TO_LOADTEST,
send_ssm_command, and the docker compose up/down commands accordingly so stage
is only started once data copy/restore finishes.

Comment on lines +228 to +232
"DUMP_FILE=/tmp/solid-connection-loadtest-$(date +%Y%m%d%H%M%S).sql.gz",
"MYSQL_PWD=\"$PROD_PASSWORD\" mysqldump --single-transaction --set-gtid-purged=OFF --column-statistics=0 -h \($prod_endpoint) -P \($prod_port) -u \"$PROD_USER\" \($database) | gzip > \"$DUMP_FILE\"",
"MYSQL_PWD=\"$LOAD_PASSWORD\" mysql -h \($loadtest_endpoint) -P \($loadtest_port) -u \"$LOAD_USER\" -e \"DROP DATABASE IF EXISTS \\\`\($database)\\\`; CREATE DATABASE \\\`\($database)\\\` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;\"",
"gunzip -c \"$DUMP_FILE\" | MYSQL_PWD=\"$LOAD_PASSWORD\" mysql -h \($loadtest_endpoint) -P \($loadtest_port) -u \"$LOAD_USER\" \($database)",
"rm -f \"$DUMP_FILE\""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

prod 덤프 파일이 실패 시 /tmp에 남습니다.

여기서는 마지막 rm -f가 성공 경로에서만 실행됩니다. DROP/CREATE DATABASE나 import 단계에서 실패하면 gzip 된 prod 스냅샷이 prod EC2 디스크에 그대로 남습니다. DUMP_FILE을 만든 직후 remote shell 안에서 trap 'rm -f "$DUMP_FILE"' EXIT를 걸어 두는 편이 안전합니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/load_test/start.sh` around lines 228 - 232, The dump file DUMP_FILE
can be left on /tmp if a later command fails; after creating DUMP_FILE in the
remote shell session (right after the mysqldump command that sets DUMP_FILE),
register a shell EXIT trap such as trap 'rm -f "$DUMP_FILE"' EXIT so the
temporary gzip file is removed on any exit (success or failure); ensure the trap
is set inside the same remote shell context that creates and consumes DUMP_FILE
and that the final explicit rm -f "$DUMP_FILE" remains (the trap will be a
safety net for error paths).

Comment thread scripts/load_test/stop.sh
Comment on lines +68 to +89
local status
while true; do
sleep 5
status="$(aws ssm get-command-invocation \
--command-id "$command_id" \
--instance-id "$instance_id" \
--query "Status" \
--output text 2>/dev/null || true)"

case "$status" in
Pending|InProgress|Delayed|"") continue ;;
Success) break ;;
*)
aws ssm get-command-invocation \
--command-id "$command_id" \
--instance-id "$instance_id" \
--output json || true
echo "SSM command failed with status $status: $comment" >&2
exit 1
;;
esac
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine its contents
if [ -f "scripts/load_test/stop.sh" ]; then
  echo "=== File found. Checking full content and context ==="
  wc -l scripts/load_test/stop.sh
  cat -n scripts/load_test/stop.sh
else
  echo "File not found"
fi

Repository: solid-connection/solid-connection-infra

Length of output: 4785


🏁 Script executed:

# Search for the send_ssm_command function and related timeout patterns
rg "send_ssm_command|timeout|TIMEOUT" scripts/load_test/ --context 5

Repository: solid-connection/solid-connection-infra

Length of output: 3281


🏁 Script executed:

# Check if there are any other polling loops or wait mechanisms in the repo
rg "while true|sleep" scripts/load_test/ --context 3

Repository: solid-connection/solid-connection-infra

Length of output: 878


🏁 Script executed:

# Examine the GitHub workflow file mentioned in the comment
find . -path "*/.github/workflows/load-test-stop.yml" -o -path "*/.github/workflows/*load*test*stop*"

Repository: solid-connection/solid-connection-infra

Length of output: 122


🏁 Script executed:

# Check the start.sh file to see if it has the same pattern
head -90 scripts/load_test/start.sh | tail -30

Repository: solid-connection/solid-connection-infra

Length of output: 633


🏁 Script executed:

# Examine the load-test-stop workflow
head -50 .github/workflows/load-test-stop.yml

Repository: solid-connection/solid-connection-infra

Length of output: 1171


SSM polling 루프에 타임아웃이 없어 workflow가 무기한 멈출 수 있습니다.

send_ssm_command() 함수의 while true 루프가 SSM 명령 상태를 폴링할 때 타임아웃 기능이 없습니다. 인스턴스 불응 또는 명령 실행 오류로 상태가 Pending, InProgress, Delayed 중 하나에서 영구히 유지되면 스크립트는 무한정 대기하게 되어 workflow가 행(hang)됩니다.

최대 대기시간을 설정하고 초과 시 명시적으로 실패시키는 방식으로 개선해야 합니다.

🔧 제안 수정
   local status
+  local started_at
+  local timeout_seconds=900
+  started_at="$(date +%s)"
   while true; do
     sleep 5
+    if (( $(date +%s) - started_at >= timeout_seconds )); then
+      echo "SSM command timed out after ${timeout_seconds}s: $comment" >&2
+      exit 1
+    fi
+
     status="$(aws ssm get-command-invocation \
       --command-id "$command_id" \
       --instance-id "$instance_id" \
       --query "Status" \
       --output text 2>/dev/null || true)"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/load_test/stop.sh` around lines 68 - 89, The polling loop in
send_ssm_command() (the while true that checks status for command_id and
instance_id) lacks a timeout and can hang indefinitely; add a configurable max
wait (e.g., MAX_WAIT_SECONDS or MAX_ITERATIONS) and track elapsed time or loop
counts inside the loop, break and treat as failure when exceeded, and on timeout
call aws ssm get-command-invocation for diagnostics and exit 1 with a clear
message including the timeout, command_id and instance_id.

- 상세내용: load_test Terraform plan workflow를 추가했습니다.

- 상세내용: loadtest RDS 네트워크를 stage EC2 VPC 기준으로 생성하도록 수정했습니다.

- 상세내용: SSM 명령 timeout, dump cleanup, k6 파일 동기화, 데이터 복사 후 stage 전환 순서를 반영했습니다.

- 상세내용: k6 설정과 응답 검증 오류를 수정했습니다.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Terraform Plan: load_test

Plan: 8 to add, 0 to change, 0 to destroy.

Full plan output is kept in the workflow artifact for security. Check workflow run artifact.

lsy1307 added 5 commits May 6, 2026 17:04
- 상세내용: 임시로 생성되는 load_test 환경을 PR Terraform plan 대상에서 제외했습니다.

- 상세내용: load_test apply와 destroy는 수동 GitHub Actions workflow에서만 실행하도록 정리했습니다.
- 상세내용: load_test Terraform에 k6 전용 EC2와 보안 그룹을 추가했습니다.

- 상세내용: stage EC2에는 k6 파일을 배치하지 않도록 app_stack cloud-init 구성을 제거했습니다.

- 상세내용: k6 실행에 필요한 기본값은 secret이 아닌 Terraform 기본값과 output으로 관리하도록 정리했습니다.
- 상세내용: Load Test Run workflow를 추가해 k6 전용 EC2에서 부하를 생성하도록 했습니다.

- 상세내용: loadtest workflow가 전용 AWS_LOAD_TEST_ROLE_ARN 변수를 사용하도록 분리했습니다.

- 상세내용: start 스크립트에서 stage k6 동기화를 제거하고 생성된 부하 생성 EC2 정보를 출력하도록 수정했습니다.
- 상세내용: Prometheus remote-write 설정을 환경 변수 기반으로 일관되게 export하도록 수정했습니다.

- 상세내용: k6 VU, iteration, duration, target URL을 실행 시 주입할 수 있도록 변경했습니다.

- 상세내용: 대학, 어학 점수, GPA 응답이 비어 있을 때 명확히 fail하도록 검증을 추가했습니다.
- 상세내용: Start, Run, Stop workflow 기준의 부하 테스트 실행 흐름을 문서화했습니다.

- 상세내용: secret에 새로 추가할 값이 없고 민감하지 않은 값은 workflow 입력과 기본값으로 관리한다는 점을 명시했습니다.

- 상세내용: stage EC2가 아닌 k6 전용 EC2에서 부하를 생성하는 구조를 설명했습니다.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: 부하테스트용 terrafrom 환경 생성

2 participants