Skip to content

Conversation

@JSv4
Copy link
Collaborator

@JSv4 JSv4 commented Jan 11, 2026

This commit introduces a comprehensive, configuration-driven forking system that addresses gaps in the original implementation:

New Features:

  • Fork all relevant models: documents, annotations, relationships, extracts, fieldsets, columns, datacells, conversations, notes, and folder structure
  • Enumerable ForkableModelType enum for easy addition/removal of forkable models
  • ForkModelConfig dataclass for configuring fork behavior per model type
  • Proper permission handling per consolidated_permissioning_guide.md
  • Optional flags for fork_extracts, fork_conversations, fork_notes

Architecture:

  • opencontractserver/utils/fork_utils.py: Config and context classes
  • opencontractserver/utils/fork_handlers.py: Individual handlers per model
  • opencontractserver/tasks/fork_tasks_v2.py: Orchestrating Celery task
  • Updated StartCorpusFork mutation to use fork_corpus_v2

What is NOT forked (by design):

  • Analyses and analyzer-created annotations
  • Discussion threads (only chat conversations are forked)
  • Moderation actions, user reputation, moderator assignments
  • CorpusActions (automation configurations)

Permission Model Alignment:

  • Documents: Direct permissions (set via set_permissions_for_obj_to_user)
  • Annotations/Relationships: Inherited from doc+corpus (no explicit perms)
  • Extracts/Fieldsets: Direct permissions required
  • Conversations: Direct permissions required

Includes comprehensive test coverage in test_fork_v2.py.

This commit introduces a comprehensive, configuration-driven forking system
that addresses gaps in the original implementation:

New Features:
- Fork all relevant models: documents, annotations, relationships, extracts,
  fieldsets, columns, datacells, conversations, notes, and folder structure
- Enumerable ForkableModelType enum for easy addition/removal of forkable models
- ForkModelConfig dataclass for configuring fork behavior per model type
- Proper permission handling per consolidated_permissioning_guide.md
- Optional flags for fork_extracts, fork_conversations, fork_notes

Architecture:
- opencontractserver/utils/fork_utils.py: Config and context classes
- opencontractserver/utils/fork_handlers.py: Individual handlers per model
- opencontractserver/tasks/fork_tasks_v2.py: Orchestrating Celery task
- Updated StartCorpusFork mutation to use fork_corpus_v2

What is NOT forked (by design):
- Analyses and analyzer-created annotations
- Discussion threads (only chat conversations are forked)
- Moderation actions, user reputation, moderator assignments
- CorpusActions (automation configurations)

Permission Model Alignment:
- Documents: Direct permissions (set via set_permissions_for_obj_to_user)
- Annotations/Relationships: Inherited from doc+corpus (no explicit perms)
- Extracts/Fieldsets: Direct permissions required
- Conversations: Direct permissions required

Includes comprehensive test coverage in test_fork_v2.py.
@claude
Copy link

claude bot commented Jan 11, 2026

Pull Request Review: Backend Forking Functionality v2

This PR introduces a comprehensive, configuration-driven corpus forking system. Overall: APPROVED with Minor Suggestions.

Strengths

  • Excellent modular design with clear separation between fork_utils.py, fork_handlers.py, and fork_tasks_v2.py
  • Configuration-driven approach via ForkableModelType enum
  • Correctly follows consolidated_permissioning_guide.md for permission handling
  • IDOR protection with consistent error messages
  • Comprehensive test coverage (995 lines)
  • Detailed logging and error handling

Critical Issues (Must Fix Before Merge)

1. Permission Helper Bug - BLOCKS MERGE

Location: opencontractserver/utils/fork_handlers.py:482 and 592-594

The code calls set_permissions_for_obj_to_user with wrong argument type:

set_permissions_for_obj_to_user(ctx.user_id, new_fieldset, [PermissionTypes.CRUD])

First argument should be User object, not user_id (int). This will fail at runtime.

Fix:

user = User.objects.get(pk=ctx.user_id)
set_permissions_for_obj_to_user(user, new_fieldset, [PermissionTypes.CRUD])

2. Large Transaction Performance Risk

Location: opencontractserver/tasks/fork_tasks_v2.py:121

All forking happens in one transaction. For large corpuses (1000+ documents), this could cause timeouts, hold locks, and consume significant memory.

Suggestion: Consider per-phase transactions or savepoints for partial rollback.

Medium Priority Issues

3. Inefficient Batch Processing

Location: fork_handlers.py:299-361

Code prepares annotations for bulk_create but saves individually, defeating the purpose. Comment says "Can't use bulk_create because we need IDs" but Django 4.x returns IDs from bulk_create.

4. File Copying Unclear

Location: fork_handlers.py:211-275

ForkModelConfig lists has_file_fields for documents, but fork_documents() relies on Corpus.add_document(). Need to verify/document whether file copying is handled.

5. Missing Database Indexes

Ensure indexes exist on:

  • annotations.corpus_id + analysis_id
  • relationships.corpus_id + analysis_id
  • conversations.corpus_id + conversation_type

Low Priority Suggestions

  • Unused ForkModelConfig fields (batch_size, use_bulk_create, filter_kwargs not used by handlers)
  • Consider splitting 995-line test file into multiple files
  • Define "[FORK]" prefix as constant instead of magic string
  • Add user object to ForkContext to avoid repeated lookups

Testing Recommendations

  1. Run: docker compose -f test.yml run django pytest opencontractserver/tests/test_fork_v2.py -n 4 --dist loadscope
  2. Test with large corpus (1000+ documents) to verify no timeout
  3. Test permission edge cases (read-only access, partial document access, no label set, empty corpus)
  4. Verify file copying for documents

Documentation

Consider updating:

  • CHANGELOG.md with new forking system
  • docs/architecture/ with corpus_forking.md
  • Add flag examples to mutation docstring

Conclusion

Excellent architecture and implementation! Fix the permission helper bug (#1) and address transaction concerns (#2), then this is ready to merge. The modular design will make future enhancements easy.

Recommendation: Fix critical issues, then merge.

@codecov
Copy link

codecov bot commented Jan 11, 2026

Codecov Report

❌ Patch coverage is 79.84410% with 181 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
opencontractserver/utils/fork_handlers.py 77.42% 86 Missing ⚠️
opencontractserver/tasks/fork_tasks_v2.py 17.24% 72 Missing ⚠️
opencontractserver/utils/fork_utils.py 75.00% 23 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants