I wrote a sql-query to better understand the data quality of the extension Hunks. For this I'm summing up the added and removed lines of a hunk per commit and comparing it with the output of extension CommentsLOC (which parses git log --shortstat).
This is the query:
SELECT cl.commit_id as commit_id, s.rev as rev, cl.added as added, h.added as calc_added, cl.removed as removed, h.removed as calc_removed, s.message
FROM (
SELECT commit_id, SUM(old_end_line - old_start_line + 1) as removed, SUM(new_end_line - new_start_line + 1) as added
FROM hunks
GROUP BY commit_id
) as h
RIGHT JOIN commits_lines cl ON h.commit_id = cl.commit_id
JOIN scmlog s ON s.id = cl.commit_id
WHERE h.added != cl.added or h.removed != cl.removed
While investigating, why some commits don't add up, I already published some patches to increase the data quality:
One thing, that is really annoying, is that CommitsLOC sometimes counts wrong up to 5 lines. I investigated the issue and found, that this is a bug with git itself. I already send a bug report to the git mailing list, but so far, no answer.
Here is, what I observed with repo https://github.com/voldemort/voldemort.git :
The command git log --numstat c21ad764 shows for the commit c21ad764 and file .../readonly/mr/HadoopStoreBuilderReducer.java 25 lines added and 22 lines removed.
But the patch of HadoopStoreBuilderReducer.java that I get with git show c21ad764 -- contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducer.java adds 30 lines and removes 27.
So 5 added and 5 removed lines are missing with git log --shortstat!
More commits where I observed this problem on the same repository:
- 7e00fb6d2cf131dfed59c180f2171952808cc336 src/java/voldemort/client/rebalance/MigratePartitions.java
- 78ad6f2a6ea327dbae2110f4530a5bd07e5deaac src/java/voldemort/client/rebalance/MigratePartitions.java (same commit on another branch)
- 7871933f0f0f056e2eeac03a01db1e9cf81f8bda src/java/voldemort/client/protocol/admin/AdminClient.java
- 2d6f68b09c3bdc23dcf3ae1f91c9285fbd668820 src/java/voldemort/store/readonly/ExternalSorter.java
- 6fcacee866307ec34eb32b268e2c2b885a949319 build.xml
Maybe someone has an idea or C skills to build a working patch for git.
I wrote a sql-query to better understand the data quality of the extension Hunks. For this I'm summing up the added and removed lines of a hunk per commit and comparing it with the output of extension CommentsLOC (which parses
git log --shortstat).This is the query:
While investigating, why some commits don't add up, I already published some patches to increase the data quality:
One thing, that is really annoying, is that CommitsLOC sometimes counts wrong up to 5 lines. I investigated the issue and found, that this is a bug with git itself. I already send a bug report to the git mailing list, but so far, no answer.
Here is, what I observed with repo https://github.com/voldemort/voldemort.git :
The command
git log --numstat c21ad764shows for the commitc21ad764and file.../readonly/mr/HadoopStoreBuilderReducer.java25 lines added and 22 lines removed.But the patch of HadoopStoreBuilderReducer.java that I get with
git show c21ad764 -- contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducer.javaadds 30 lines and removes 27.So 5 added and 5 removed lines are missing with
git log --shortstat!More commits where I observed this problem on the same repository:
Maybe someone has an idea or C skills to build a working patch for git.