-
Notifications
You must be signed in to change notification settings - Fork 2
Basic statistics implementation, not yet integrated. #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| package com.whichclasses.statistics; | ||
|
|
||
| import java.util.ArrayList; | ||
| import java.util.Collection; | ||
| import java.util.Collections; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| import com.google.common.collect.Maps; | ||
| import com.google.inject.Inject; | ||
| import com.whichclasses.model.Course; | ||
| import com.whichclasses.model.DataSource; | ||
| import com.whichclasses.model.Department; | ||
| import com.whichclasses.model.proto.TceRating.Question; | ||
|
|
||
| public class ConcreteAggregator implements DataAggregator { | ||
|
|
||
| DataSource dataSource; | ||
| Map<String, Map<Question, Double>> courseScoreDataMap = Maps.newHashMap(); | ||
|
|
||
| @Inject | ||
| public ConcreteAggregator(DataSource dataSource) { | ||
| this.dataSource = dataSource; | ||
| } | ||
|
|
||
| @Override | ||
| public List<Course> getRankedCourses(String deptId, Question question) { | ||
| Department dept = dataSource.getDepartmentList().getChildren().get(deptId); | ||
| Collection<Course> courses = dept.getChildren().values(); | ||
| List<Course> courseList = new ArrayList<Course>(courses); | ||
| Collections.sort(courseList, new SimpleCourseComparator(question)); | ||
| return courseList; | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| package com.whichclasses.statistics; | ||
|
|
||
| import java.util.List; | ||
|
|
||
| import com.whichclasses.model.Course; | ||
| import com.whichclasses.model.proto.TceRating.Question; | ||
|
|
||
| public interface DataAggregator { | ||
| public List<Course> getRankedCourses(String deptId, Question question); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| package com.whichclasses.statistics; | ||
|
|
||
| import com.whichclasses.model.proto.TceRating; | ||
|
|
||
| public interface Rateable { | ||
| public int getRatingCount(TceRating.Question question); | ||
| public double getAverageScore(TceRating.Question question); | ||
| public double getWilsonScore(TceRating.Question question); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does Wilson score make sense on an individual Rateable item, or only on a RateableCollection?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes sense in either context, all it requires is a collection of ratings (which can belong to an individual class or a collection of classes). You can rate a course by grabbing all class ratings and then calculating a Wilson score from that set, or you can calculate individual Wilson scores for each class and then aggregate the class results in some other way. |
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| package com.whichclasses.statistics; | ||
|
|
||
| import java.util.Collection; | ||
| import java.util.Map; | ||
|
|
||
| import com.google.common.collect.Maps; | ||
| import com.whichclasses.model.proto.TceRating; | ||
| import com.whichclasses.model.proto.TceRating.Question; | ||
|
|
||
| public class RateableCollection implements Rateable { | ||
|
|
||
| private Collection<? extends Rateable> children; | ||
| private Map<Question, Double> questionToScore = Maps.newHashMap(); | ||
| private Map<Question, Double> questionToWilsonScore = Maps.newHashMap(); | ||
| private Map<Question, Integer> questionToNumRatings = Maps.newHashMap(); | ||
|
|
||
| public RateableCollection( Collection<? extends Rateable> children ) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style nit: remove spaces after "(" and before ")" |
||
| this.children = children; | ||
| calculateStatistics(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this be done more lazily? (e.g. when the first value is requested) |
||
| } | ||
|
|
||
| private void calculateStatistics() { | ||
| for(Question question : Question.values()) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style nit: space between "for" and "(" tokens (here and below, same for "if") |
||
| double qualityPoints = 0; | ||
| int numRatings = 0; | ||
| for(Rateable rateable : children) { | ||
| qualityPoints += rateable.getAverageScore(question) * rateable.getRatingCount(question); | ||
| numRatings += rateable.getRatingCount(question); | ||
| } | ||
| double averageScore = numRatings != 0 ? qualityPoints / numRatings : 0; | ||
| double wilsonScore = calculateWilsonScore((averageScore - 1) / 4, numRatings) * 4 + 1; | ||
| questionToScore.put(question, averageScore); | ||
| questionToWilsonScore.put(question, wilsonScore); | ||
| questionToNumRatings.put(question, numRatings); | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public double getAverageScore(TceRating.Question question) { | ||
| return questionToScore.get(question); | ||
| } | ||
|
|
||
| @Override | ||
| public int getRatingCount(TceRating.Question question) { | ||
| return questionToNumRatings.get(question); | ||
| } | ||
|
|
||
| @Override | ||
| public double getWilsonScore(TceRating.Question question) { | ||
| return questionToWilsonScore.get(question); | ||
| } | ||
|
|
||
| /** | ||
| * Calculates the lower bound of the 95% confidence interval of a normalized data set of data | ||
| * @param p - Normalized average of sample data set | ||
| * @param n - Number of samples | ||
| * @return The normalized Wilson score of a normalized data set | ||
| */ | ||
| private double calculateWilsonScore(double p, int n) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. static |
||
| if(n == 0) return 0; | ||
| double z = 1.96; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make this value (1.96) a static final constant naming the confidence interval |
||
| return (p + z*z/(2*n) - z*Math.sqrt(p*(1-p)/n + z*z/(4*n*n)))/(1 + z*z/n); | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| package com.whichclasses.statistics; | ||
|
|
||
| import java.util.Comparator; | ||
|
|
||
| import com.whichclasses.model.Course; | ||
| import com.whichclasses.model.proto.TceRating.Question; | ||
|
|
||
| public class SimpleCourseComparator implements Comparator<Course> { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. naming: AverageCourseComparator (to go with using getAverageScore) |
||
|
|
||
| private Question question; | ||
|
|
||
| public SimpleCourseComparator(Question question) { | ||
| this.question = question; | ||
| } | ||
|
|
||
| @Override | ||
| public int compare(Course o1, Course o2) { | ||
| double diff = o2.getAverageScore(question) - o1.getAverageScore(question); | ||
| return (int) (diff < 0 ? Math.floor(diff) : Math.ceil(diff)); | ||
| } | ||
|
|
||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: extra blank line |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got a better name than ConcreteAggregator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal convention recently has been to name classes 'Standard[interface name]" for interfaces which only have one real implementation (not including test classes). Any better alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For [interface] and single-implementation, I prefer the "[Interface]Impl" convention. But ideally I'd prefer a descriptive name, particularly since it's an "aggregator" and should describe what kind of aggregation it is doing. RankingDataAggregator?
Note that if you don't have a reason to split the interface/implementation out, you can also just combine them into one class. Guice isn't restricted to injecting interfaces, so injecting a concrete class could make sense if you don't expect to have other aggregators going forward.
There are definitely a few cases in the codebase right now of an interface with a single implementation, but most of those have an intention (or at least a plausible possibility) to have multiple future implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unit tests. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can change DataAggregator so that it just implements a single "aggregate" method. This way we can have sorting aggregators, grouping aggregators, etc. Maybe genericize DataAggregator with the return type of the aggregate method?
public class RankingDataAggregator implements DataAggregator<List>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable, what would the return type be though? Why not always
a Rateable?
On Tue, May 5, 2015 at 12:09 PM, Sean Topping notifications@github.com
wrote:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the intent of the class is to answer queries such as:
"Give me a list of rated courses in CSC."
"Give me a list of rated professors in AME."
"Give me a list of rated semesters for ACCT." (how are the classes in ACCT changing over time?)
Come to think of it, maybe the aggregator should always just return a list of Rateable objects. Then you can do sorting and selection as needed. Basically it's just a way to convert a large collection of individual class ratings into chunked rated collections (e.g. courses, departments, instructors, semesters, etc). Basically a GROUP BY implementation.