Skip to content

Conversation

@jimmychiuuuu
Copy link
Collaborator

Summary

This PR implements support for Intel TDX and AMD SEV-SNP confidential computing devices, refactoring the plugin into a robust, multi-resource architecture. It also corrects the resource allocation logic for vTPM to ensure proper sharing.

Key Changes

  1. Multi-Device Architecture:
  • Refactored main.go to support multiple plugin instances running concurrently.
  • Added allDeviceSpecs to manage configuration for intel.com/tdx, amd.com/sev-snp, and google.com/cc.
  1. Resource Limits Correction:
  • Fixed vTPM Limit: Corrected google.com/cc device limit from 1 to 256 to correctly model it as a shared resource.
  • Set TDX/SEV-SNP limits to 1 (exclusive resource).
  1. Reliability & Hygiene:
  • Implemented robust socket cleanup using defer os.Remove(socketPath) to prevent EADDRINUSE errors during restarts.
  • Refactored hardcoded paths into constants for better maintainability.
  • Standardized allocation logging to level.Debug to reduce noise.
  1. Testing:
  • Added comprehensive unit tests covering multi-path discovery, ID generation, and multi-instance allocation.
  • Initialized Prometheus metrics in test helpers to prevent runtime panics.

Verification
Unit Tests: All passed (go test -v ./deviceplugin/...).

E2E Validation (on GKE Confidential Nodes):

Discovery: Validated kubectl get nodes reports intel.com/tdx: 1, amd.com/sev-snp: 1, and google.com/cc: 256.

Allocation: Deployed test pods (pod-tdx.yaml, pod-snp.yaml) and verified successful injection of /dev/tdx_guest and /dev/sev-guest.

Resilience: Manually deleted the device plugin pod; confirmed that running workloads remained healthy (0 restarts).

Copy link
Collaborator

@kongoshuu kongoshuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jimmy! This is a great PR! I took a first pass, will need to take another deep look, but just sending out for now to make sure I understand the code.

kongoshuu
kongoshuu previously approved these changes Jan 6, 2026
@jimmychiuuuu
Copy link
Collaborator Author

Hi @kongoshuu, thank you for the approval!

I've added a final commit to refine the CI workflow and address some existing linting issues (e.g., errcheck, gofmt). I also expanded the test scope to ./... to ensure full coverage.

Once the maintainer approves the workflow run, the checks should pass successfully. Thank you so much!

@jimmychiuuuu jimmychiuuuu merged commit 6ae5122 into google:main Jan 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants