Skip to content

Conversation

@mradian1
Copy link

Counts log entries in order to help with logs cleanup

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a log entry counting feature to help with logs cleanup operations. It adds a new multiversx_logs_counter package with infrastructure for parsing, counting, and reporting on log entries by level and message type.

Key changes:

  • New log counter package with parser, checker, and archive handler components
  • Enhanced entry parser with pattern matching for specific log message types (epochs, rounds, subrounds, blocks, headers)
  • Refactored directory creation logic in node logs checker to work consistently across different output paths

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
multiversx_logs_parser_tools/node_logs_checker.py Refactored directory creation to apply to both default and custom output paths
multiversx_logs_parser_tools/entry_parser.py Added pattern matching for specific log types and test code in main block
multiversx_logs_counter/counter_structures.py Defines LogLevel enum and CounterData class for tracking log message counts
multiversx_logs_counter/counter_parser.py Implements parser extending AhoCorasickParser to count log entries by level
multiversx_logs_counter/counter_checker.py Implements checker to process and output counted log data as JSON reports
multiversx_logs_counter/counter_archive_handler.py Provides archive handler with main entry point for processing log archives
Comments suppressed due to low confidence (2)

multiversx_logs_parser_tools/entry_parser.py:123

  • Implicit string concatenation. Maybe missing a comma?
                    'DEBUG[2025-11-12 12:01:22.747][..Start/shardchain][0/8/14409/(END_ROUND)]  # EPOCH 9 BEGINS IN ROUND (14409) ##################################'
                    'DEBUG[2025-11-12 09:27:35.419] [process/block] [0/1/299/(END_ROUND)] header hash: 209622a0c04a556829507a7a2176aaee2a58a524766b805dc9a423e2fe023072',

multiversx_logs_parser_tools/node_logs_checker.py:25

        self.initialize_checker(args)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from .counter_checker import CounterChecker, CounterParser


class CounterArchiveHandler(ArchiveHandler):
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the CounterArchiveHandler class. Public classes should have docstrings explaining their purpose and usage in the archive processing workflow.

Suggested change
class CounterArchiveHandler(ArchiveHandler):
class CounterArchiveHandler(ArchiveHandler):
"""Archive handler that runs a CounterChecker over log archives.
This class coordinates archive traversal (inherited from
:class:`ArchiveHandler`) with a :class:`CounterChecker` instance,
which parses and validates log files using the provided
:class:`CounterParser`. It is typically instantiated with a
configured ``CounterChecker`` and the root ``logs_path`` of the
archive to be processed, and then used via ``handle_logs()`` to
perform the full archive processing workflow.
"""

Copilot uses AI. Check for mistakes.
'DEBUG[2025-11-12 13:36:33.202][..ensus/chronology][0/13/23927/(START_ROUND)] 2025-11-12 13:36:33.202271507 ...................................... SUBROUND(BLOCK) BEGINS ...................................... ',
'DEBUG[2025-11-12 13:36:33.228][..ensus/chronology][0/13/23927/(BLOCK)] 2025-11-12 13:36:33.227644328 .................................... SUBROUND(SIGNATURE) BEGINS .................................... ',
'DEBUG[2025-11-12 13:36:33.234][..ensus/chronology][0/13/23927/(SIGNATURE)] 2025-11-12 13:36:33.234186524 .................................... SUBROUND(END_ROUND) BEGINS .................................... ',
'DEBUG[2025-11-12 12:01:22.747][..Start/shardchain][0/8/14409/(END_ROUND)] # EPOCH 9 BEGINS IN ROUND (14409) ##################################'
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline between the two test log entries. Line 122 and 123 should be separated with a comma and newline for better readability and to match the pattern of the list.

Suggested change
'DEBUG[2025-11-12 12:01:22.747][..Start/shardchain][0/8/14409/(END_ROUND)] # EPOCH 9 BEGINS IN ROUND (14409) ##################################'
'DEBUG[2025-11-12 12:01:22.747][..Start/shardchain][0/8/14409/(END_ROUND)] # EPOCH 9 BEGINS IN ROUND (14409) ##################################',

Copilot uses AI. Check for mistakes.
log_level.name: {} for log_level in LogLevel
}

def add_message(self, log_level: str, message: str, count=1) -> None:
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for log_level parameter. The method should validate that the log_level exists in the counter_dictionary to prevent KeyError when an unexpected log level is passed.

Suggested change
def add_message(self, log_level: str, message: str, count=1) -> None:
def add_message(self, log_level: str, message: str, count=1) -> None:
if log_level not in self.counter_dictionary:
raise ValueError(f"Unknown log level: {log_level}")

Copilot uses AI. Check for mistakes.
self.counters.add_message(level, message)

def should_parse_line(self, pattern: Pattern[str]) -> bool:
return True
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should_parse_line method always returns True, making the pattern parameter unused. Consider removing the parameter if it's not needed, or implement conditional logic if different patterns should be handled differently.

Suggested change
return True
"""
Decide whether a line associated with the given pattern should be parsed.
This implementation returns True for the log level patterns that this parser
is configured to handle. For the patterns currently returned by get_patterns(),
this method behaves the same as the previous unconditional True.
"""
allowed_levels = {"INFO", "TRACE", "DEBUG", "WARN", "ERROR"}
# Pattern may be a compiled regex or another string-like object.
pattern_str = getattr(pattern, "pattern", str(pattern))
return any(level in pattern_str for level in allowed_levels)

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16
self.parsed = CounterData()
return super().initialize_checker(args)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization order may be problematic. The initialize_checker method creates a CounterData instance for self.parsed, then calls super().initialize_checker(args), which may overwrite or interfere with the initialization. Consider calling super() first, then initializing self.parsed.

Suggested change
self.parsed = CounterData()
return super().initialize_checker(args)
result = super().initialize_checker(args)
self.parsed = CounterData()
return result

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +22
def get_patterns(self) -> list[tuple[Pattern[str], int]]:
patterns = []
patterns.append(('INFO', 0))
patterns.append(('TRACE', 1))
patterns.append(('DEBUG', 2))
patterns.append(('WARN', 3))
patterns.append(('ERROR', 4))

return patterns
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotation mismatch in return type. The method returns a list of tuples containing strings and integers, but the type hint indicates Pattern objects. The return type should be list[tuple[str, int]] instead of list[tuple[Pattern[str], int]].

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +18
# self.shard_data = ShardData()
super().__init__(checker, logs_path)

def process_node_data(self):
"""Process the parsed data for a single node."""
# node_data = HeaderData()
# node_data.header_dictionary = self.checker.parsed
# self.shard_data.add_node(node_data)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed. If this functionality is planned for future implementation, it should be tracked in an issue or TODO comment instead of leaving non-functional commented code.

Suggested change
# self.shard_data = ShardData()
super().__init__(checker, logs_path)
def process_node_data(self):
"""Process the parsed data for a single node."""
# node_data = HeaderData()
# node_data.header_dictionary = self.checker.parsed
# self.shard_data.add_node(node_data)
super().__init__(checker, logs_path)
def process_node_data(self):
"""Process the parsed data for a single node."""
# TODO: implement node data processing logic if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +30
def process_match(self, line: str, end_index: int, pattern_idx: int, args: dict[str, str]) -> dict[str, Any]:
parsed = super().process_match(line, end_index, pattern_idx, args)
return parsed

Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process_match method calls super().process_match() but doesn't utilize the returned value meaningfully - it just returns it directly without any additional processing. Consider whether this override is necessary or if the implementation adds value.

Suggested change
def process_match(self, line: str, end_index: int, pattern_idx: int, args: dict[str, str]) -> dict[str, Any]:
parsed = super().process_match(line, end_index, pattern_idx, args)
return parsed

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +13
class CounterData:

Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the CounterData class. Public classes should have docstrings explaining their purpose, usage, and key attributes.

Suggested change
class CounterData:
class CounterData:
"""
Container for counting log messages per log level.
This class maintains a nested dictionary structure where the first key is
the log level name (as defined in ``LogLevel``) and the second key is the
log message string. The corresponding value is the number of times that
message has been observed at that log level.
Attributes
----------
counter_dictionary : dict[str, dict[str, int]]
Maps a log level name to a dictionary of message strings and their
associated occurrence counts.
Usage
-----
- Use :meth:`add_message` to increment the count for a single message.
- Use :meth:`add_counted_messages` to merge counts from another
:class:`CounterData` instance.
- Use :meth:`reset` to clear all stored counts.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +10
class CounterChecker(NodeLogsChecker):

Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for the CounterChecker class. Public classes should have docstrings explaining their purpose and their role in the log counting workflow.

Suggested change
class CounterChecker(NodeLogsChecker):
class CounterChecker(NodeLogsChecker):
"""
CounterChecker coordinates log parsing and counting for a single node.
It uses a CounterParser to extract and count log messages, aggregates the
results into CounterData, and produces per-node JSON reports as part of
the overall log counting workflow.
"""

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants