feat(comments): Add runner for comments migration separately#380
feat(comments): Add runner for comments migration separately#380sakshamarora1 wants to merge 5 commits intoCERNDocumentServer:masterfrom
Conversation
| self.all_record_versions = { | ||
| str(hit["versions"]["index"]): hit for hit in search_result | ||
| } | ||
| oldest_version = min( |
There was a problem hiding this comment.
wouldn't it be faster via record._record.versions[-1]? I mean instead of scan_versions etc.
There was a problem hiding this comment.
That returns an instance of VersionsManager and it doesn't have other versions stored in it. We will have to do scan_versions to find all the versions and select the minimum un-deleted version available
| elif comment_status == "dm": | ||
| comment_payload["payload"].update( | ||
| { | ||
| "content": "comment was deleted by the moderator.", |
There was a problem hiding this comment.
in RDM we do not have the "moderator" - it would be good to align it with what we display when we delete a comment in RDM (I don't remember the exact text). ping @zzacharo for more opinions
There was a problem hiding this comment.
We not display this content for the deleted comments, we do this in the frontend
| {}, request=request.model, request_id=str(request.id), type=event_type | ||
| ) | ||
|
|
||
| if data.get("file_relation"): |
There was a problem hiding this comment.
can you add a small comment on why we are doing this?
| ) | ||
| return self.all_record_versions[str(oldest_version)] | ||
|
|
||
| def create_event(self, request, data, community, record, parent_comment_id=None): |
There was a problem hiding this comment.
is there any way we can optimise this function to be more readable? there are a lot of conditional statements, some with repeated conditions, also it would be good if we avoid nesting
There was a problem hiding this comment.
I did some more optimisations
| {"user": str(user.id)}, raise_=True | ||
| ) | ||
| else: | ||
| print("User not found for email: ", data.get("created_by")) |
There was a problem hiding this comment.
the print is redundant if you raise.
what will happen if you raise? will the whole script halt? and need to be re-run?
There was a problem hiding this comment.
No, it gets caught and logged in _load() and now that I have put it under the UnitOfWork context as you suggested, it will rollback when this is raised.
There was a problem hiding this comment.
will it rollback the whole record or the whole migration?
If this happens, how will you resume so that missing (failed) comments are covered in the second/third and subsequent attempts?
| event.model.version_id = 0 | ||
|
|
||
| event.commit() | ||
| db.session.commit() |
There was a problem hiding this comment.
would it be better if we do the uow instead? otherwise you will need to re-index all requests
plus, from records migration experience I can tell you uow is faster
| created_at = datetime.fromisoformat(record["created"]) | ||
| request.model.created = created_at | ||
|
|
||
| request.commit() |
There was a problem hiding this comment.
this part would also benefic from uow
| environment, collection | ||
| ) | ||
| """ | ||
| collection_name/ |
There was a problem hiding this comment.
nice, this docstring very helpful, thank you!
|
|
||
|
|
||
| # Function to flatten arbitrarily nested comment replies into a 1-level replies list | ||
| def flatten_replies(comments_list): |
There was a problem hiding this comment.
let's do the rubber duck excersise on this one :)
| identity=system_identity, | ||
| request_id=request["id"], | ||
| ) | ||
| assert comments.total == 2 # 1 comment and 1 reply |
There was a problem hiding this comment.
I would add maybe a small check for the content of the comment and reply, just to make sure
kpsherva
left a comment
There was a problem hiding this comment.
can we store migrated comments ids on the request level to have a retry strategy if a migration run fails? In order to understand which comments to skip on the second/third etc run
f420de2 to
7f6c6ed
Compare
closes: #286
Steps
Output file:
comments_metadata.jsonThe above script also runs this script partly to generate the
missing_users.jsonfilepoeple.csv,missing_users.jsonin/eos/media/cds/cds-rdm/<env>/migration/users/directory andcomments_metadata.jsonfile in/eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/IMPORTANT: We are not running the
./scripts/copy_comments_attached_files.pyyet (or we can, for thesis and IT there are none anyways) [See child issue in the main issue attached]/eos/media/cds/cds-rdm/<env>/migration/users/):(Remove --dry-run)
(Remove --dry-run)