Skip to content

Conversation

@nucleogenesis
Copy link
Member

Summary

Adds a create_channel_versions command that uses the published_data dict on Channels to create new ChannelVersion instances.

I worked out the initial approach and some simple tests myself and asked Claude to help extrapolate the tests I made to include checks for things like the Audited license models and such.

References

Closes #5593

Reviewer guidance

Have I missed anything?

@nucleogenesis nucleogenesis force-pushed the esocc--channel-version-command branch from 4cf92bc to d895d69 Compare January 22, 2026 23:10
@nucleogenesis nucleogenesis force-pushed the esocc--channel-version-command branch from d895d69 to a70ad56 Compare January 23, 2026 00:45
Comment on lines 330 to 334
def test_channel_with_existing_special_permissions_in_published_data(self):
"""
If published_data already contains special_permissions_included,
the command should not recompute it.
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels right even though I don't think this would run into the problem unless we fail and rerun it - I'm not sure what the plan for running the command would look like so guidance there would be greatly appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is right just as it is now! This could be replicated in unstable since we already have some AuditedSpecialPermissionsLicense created there, but the behavior is correct!

Copy link
Member

@AlexVelezLl AlexVelezLl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @nucleogenesis! Just noted a couple of things that I didn't note while writing the issue 😅.

Comment on lines 330 to 334
def test_channel_with_existing_special_permissions_in_published_data(self):
"""
If published_data already contains special_permissions_included,
the command should not recompute it.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is right just as it is now! This could be replicated in unstable since we already have some AuditedSpecialPermissionsLicense created there, but the behavior is correct!

# to the most recent ChannelVersion instance
for channel in channels.iterator():
logging.info(f"Processing channel {channel.id}")
last_created_ch_ver = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it seems that this method is working fine, I am not sure if we can rely on the channel.published_data.values()returning always the most recent version at the end. I think a safer approach could be to check if pub_data.get("version") === channel.version.

Comment on lines 148 to 157
last_created_ch_ver = ChannelVersion.objects.create(
channel=channel,
version=pub_data.get("version"),
included_categories=pub_data.get("included_categories", []),
included_licenses=pub_data.get("included_licenses"),
included_languages=pub_data.get("included_languages"),
non_distributable_licenses_included=pub_data.get(
"non_distributable_licenses_included"
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, apologies, I didn't specify this in the issue 😅, the ChannelVersion model has more fields besides the ones we need to compute here. Here, you can see all values we set to the channelVersion object on the publish flow, so we would need to set all of them here too.

Additionally, just to play safe, I think we can use the create_or_update method so that in case the command is re-run or in case we have some channel versions already set up (in unstable, for example), it doesn't fail because channel and version are unique together.

logging.info(
f"Validating published data for channel {channel.id} version {pub_data['version']}"
)
special_permissions = validate_published_data(pub_data, channel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that validate_published_data is updating the published_data object by computing properties using the current published_nodes, which means that these computed properties only belong to the most recent version. Therefore, annotating these properties in previous versions would be an error. Since we don't need previous versions to have these computed properties, we don't need to care about computing them, and we can just call this validate_published_data if pub_data corresponds to the current channel version.

@AlexVelezLl AlexVelezLl self-assigned this Jan 26, 2026
nucleogenesis and others added 4 commits January 28, 2026 12:35
- Add TestValidatePublishedData class with direct unit tests for the
  validate function covering: return type, included_licenses,
  included_languages, included_categories, kind_counts,
  non_distributable_licenses, and special_permissions
- Remove test_command_skips_channels_with_existing_version_info
  (didn't actually test idempotency)
- Remove test_channel_with_existing_special_permissions_in_published_data
  (wasn't testing meaningful behavior)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Member

@AlexVelezLl AlexVelezLl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nucleogenesis! Running the command found a couple of minor mismatches, nothing big 😄.

# Validate published_data
for pub_data in channel.published_data.values():
logging.info(
f"Validating published data for channel {channel.id} version {pub_data['version']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, it seems that we are not storing the version field inside each pub_data here, so this is returning an error because of that. It seems we need to get the version from the key of the object.

Comment on lines 155 to 157
valid_data, special_permissions = validate_published_data(
pub_data, channel
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that we should not validate/nor do any updates to the pub_data of versions that are not the last one, because we are computing this data with the current published nodes, not with the published nodes of any previous versions.

Comment on lines 171 to 172
"size": int(channel.published_size),
"resource_count": channel.total_resource_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These size and resource_count should also be obtained from the valid_data object, because we should rely on the data stored there; channel.published_size only matches with the most recent version.

Besides that, comparing with what we put into our published_data, it seems we are missing the date_published and version_notes.

Comment on lines 86 to 107
if not data.get("included_categories"):
included_categories_dicts = published_nodes.exclude(
categories=None
).values_list("categories", flat=True)
data["included_categories"] = sorted(
set(
chain.from_iterable(
(
node_categories_dict.keys()
for node_categories_dict in included_categories_dicts
)
)
)
)
if not data.get("included_languages"):
node_languages = published_nodes.exclude(language=None).values_list(
"language", flat=True
)
file_languages = published_nodes.exclude(files__language=None).values_list(
"files__language", flat=True
)
data["included_languages"] = list(set(chain(node_languages, file_languages)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for included_languages and included_categories that are already present in data we should do an additional validation: we should ensure that no present item is null. Previously with languages (not sure if with categories too), because of an error in the query, we were including null values in the included_languages list, but this was causing problems, and now the ChannelVersion doesn't support it.

Comment on lines 57 to 59
return AuditedSpecialPermissionsLicense.objects.bulk_create(
new_licenses, ignore_conflicts=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that bulk_create returns the same list of new_licenses even if they were not actually created, and when we want to set this to the ChannelVersion node, it fails because some objects don't exist in the db. It seems it's safer to return just this after the bulk_create:

return AuditedSpecialPermissionsLicense.objects.filter(
    description__in=special_perms_descriptions
)

Comment on lines 128 to 129
if not data.get("kind_counts"):
data["kind_counts"] = list(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the key here should be called kind_count (in singular)

Comment on lines 151 to 186
for pub_data in channel.published_data.values():
logging.info(
f"Validating published data for channel {channel.id} version {pub_data['version']}"
)
valid_data, special_permissions = validate_published_data(
pub_data, channel
)

# Create or update channel version
channel_version, _ = ChannelVersion.objects.update_or_create(
channel=channel,
version=valid_data.get("version"),
defaults={
"included_categories": valid_data.get("included_categories"),
"included_licenses": valid_data.get("included_licenses"),
"included_languages": valid_data.get("included_languages"),
"non_distributable_licenses_included": valid_data.get(
"non_distributable_licenses_included"
),
"kind_count": valid_data.get("kind_count"),
"size": int(channel.published_size),
"resource_count": channel.total_resource_count,
},
)

if channel.version == pub_data.get("version"):
channel.version_info = channel_version

# Set the M2M relation for special permissions
if special_permissions:
channel_version.special_permissions_included.set(
special_permissions
)
logging.info(
f"Created channel version {channel_version.id} for channel {channel.id}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running the command, I saw that if an error happened, the command exited. I think it'd be great if we wrapped the inner for body in a try-except block and saved the channel, version, and exception for those that failed to an array. At the end of the command, we can log that array. Just in case something happens, we have the info there. How does that sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESoCC: Create migration command to create ChannelVersion models for old channels

2 participants