Add python script to launch vtk benchmark#1
Conversation
pysco68
left a comment
There was a problem hiding this comment.
Good start, but I don't this we're tracking all relevant data in the most useful way yet;
We need to time:
- full rebuild (clean)
- Initial full build
- Configure
For "baselined cmake+ninja and for cmake-re" AFAICT
| tc_name = Path(toolchain).stem | ||
| output_file = f"cmake-run_{tc_name}.txt" |
There was a problem hiding this comment.
You shouldn't be dealing with manually collecting that data and writing it to files especially since you're not using structured data anyway;
import logging
from datetime import datetime
log_filename = f"kitwarebench_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.log"
# Configure the logging framework
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_filename), # Writes to the file
logging.StreamHandler() # Also prints to your console
]
)
# Use it!
logging.info("Application started successfully.")
logging.warning("This is a warning message.")
logging.error("An error occurred.")There was a problem hiding this comment.
That being said, you might want to go structured data and serialize to CSV or JSON both of which can be done using python built-ins
There was a problem hiding this comment.
It wasn't a completely finished script yet. But I always wanted to use JSON to store that information in the end.
There was a problem hiding this comment.
so should be done with json
| def start_docker(image, source_dir, container_name=None): | ||
| """Start a detached docker container with source_dir mounted. Returns the container name.""" | ||
| if container_name is None: | ||
| container_name = str(uuid.uuid4()) | ||
| uid = os.getuid() | ||
| gid = os.getgid() | ||
| username = getpass.getuser() | ||
| home = os.environ["HOME"] | ||
|
|
||
| print(f"Starting container {container_name}...") | ||
| subprocess.run([ | ||
| "docker", "run", | ||
| "--platform", "linux/amd64", | ||
| "--rm", "--init", | ||
| "--name", container_name, | ||
| f"-u{uid}:{gid}", "--group-add", "tipi", | ||
| "--ulimit", "nofile=65535:65535", # | ||
| "-e", "TIPI_DISABLE_AR_RANLIB_DRIVER=ON", | ||
| "-e", "TIPI_CACHE_CONSUME_ONLY=ON", | ||
| "-e", "TIPI_CACHE_FORCE_ENABLE=OFF", | ||
| "-e", "HOME", | ||
| "-e", "RBE_service=kernite.cluster.engflow.com:443", | ||
| "-e", f"RBE_tls_client_auth_key={home}/engflow-mTLS/engflow.key", | ||
| "-e", f"RBE_tls_client_auth_cert={home}/engflow-mTLS/engflow.crt", | ||
| "-v", f"{home}:{home}:rw", | ||
| "-v", f"{source_dir}:{source_dir}:rw", | ||
| "-w", str(source_dir), | ||
| "-d", | ||
| image, | ||
| "sleep", "infinity", | ||
| ], check=True) | ||
|
|
||
| # Create the user inside the container | ||
| subprocess.run([ | ||
| "docker", "exec", "-u", "0", container_name, | ||
| "useradd", "-d", home, "-u", str(uid), username, | ||
| ], check=False) | ||
|
|
||
| print(f"Container running: {container_name}") | ||
| return container_name | ||
|
|
||
|
|
||
| def stop_docker(container_name): | ||
| """Stop a running docker container.""" | ||
| print(f"Stopping container {container_name[:8]}...") | ||
| subprocess.run(["docker", "stop", "-t0", container_name], check=True) | ||
| print("Container stopped.") | ||
|
|
||
|
|
||
| def docker_exec(container_name, cmd): | ||
| """Run a command inside a running docker container and return elapsed time.""" | ||
| print(f" [{container_name[:8]}] Running: {cmd}") | ||
| start = time.perf_counter() | ||
| subprocess.run(["docker", "exec", container_name, "bash", "-c", cmd], check=True) | ||
| elapsed = time.perf_counter() - start | ||
| print(f" Done in {elapsed:.2f}s") | ||
| return elapsed |
There was a problem hiding this comment.
Nitpick: This should be in a Python RAII wrapper that allows managing the lifecycle of the container IMHO (so a class with __exit__(self, exc_type, exc_val, exc_tb) and __enter__(self) defined which then allows to do with ... as x: ... blocks)
| """Run a command inside a running docker container and return elapsed time.""" | ||
| print(f" [{container_name[:8]}] Running: {cmd}") | ||
| start = time.perf_counter() | ||
| subprocess.run(["docker", "exec", container_name, "bash", "-c", cmd], check=True) |
There was a problem hiding this comment.
You should capture the output and log it somewhere because we will certainly want to know about the error messages / details / RBE invocation ID for profiles ...
|
|
||
| # Second run: with warm RBE cache | ||
| print(" [with-cache] cmake-re configure + build...") | ||
| docker_exec(container, f"cmake-re -GNinja -S . -B ./build -DCMAKE_TOOLCHAIN_FILE={toolchain} --host --distributed") |
There was a problem hiding this comment.
track the configure time too as this was part of the tracked data in Bill's initial benchmark
| benchmark_vtk_project_cmake(source_dir, image, args.iterations, toolchains) | ||
| benchmark_vtk_project_cmake_re(source_dir, image, args.iterations, toolchains) |
There was a problem hiding this comment.
nitpick; feels like the loop doing the toolchain and iterations could be defined here to not repeat yourself so often.
|
Just a general comment about the Python code: you should now use type annotations on your functions whenever possible. It's been supported by Python 3 for a while now. On the benchmark side, it might be interesting to check the cache hit rate on rebuild by extracting it from the reproxy log files. I don't think we have much cache invalidation in this project, but if we were to reuse some of this, we'd definitely need it. |
I'm currently working on a way to retrieve all the logs related to the Docker container at the end of the process, which will allow us to review them. |
ce105fc to
781b68b
Compare
…ainer on every run
… the cluster less opportunity to scale back
…/Kitware-benchmark into feature/add-benchmark
be2552f to
4249615
Compare
No description provided.