Compression codec change#754
Conversation
…ession Conflicts: project/SparkBuild.scala
|
Reynold, thanks very much for this. Sorry, I've been working on deploying our first release on production, haven't got time to handle this. So thanks for the help, appreciate it! |
…nt with the documentation.
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/374/ |
|
Hey Reynold, one question: is there a native Snappy library this depends on? If so, where will it look for it? |
|
Yes. See http://xerial.org/snappy-java/ Portable across various operating systems; Snappy-java contains native libraries built for Window/Mac/Linux (32/64-bit). At runtime, snappy-java loads one of these libraries according to your machine environment (It looks system properties, os.name and os.arch). |
|
Yes, the jar published in Maven repository already contains native libraries for Win/Mac/Linux(32/64). While for users of other OS/CPU architecture, they may need to build the native libraries from source code as introduced in http://xerial.org/snappy-java/. |
|
Ah okay, that's great. |
There was a problem hiding this comment.
If you don't mind, rename these to compressedOutputStream and compressedInputStream -- just seems like a better naming convention for streams.
|
Hey Reynold, FYI, there are two other places where we should use the codec instead of LZF:
Basically do a search through the codebase for LZF and make sure we're replacing it. |
…sedOutputStream and compressedInputStream.
|
BTW apart from this the patch looks good to me. |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/378/ |
|
changed according to feedback. ptal. |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/384/ |
There was a problem hiding this comment.
Using singleton objects is kind of ugly for testing purposes; could we just create a new codec on each checkpoint? It doesn't seem like that big a deal.
There was a problem hiding this comment.
But then we introduce the possibility that the writer could be using one codec, and the reader uses another one.
There was a problem hiding this comment.
Hmm, I don't really get it. The reader is going to be a different instance of the program anyway, after a failure. It will have to be configured with the same codec class either way.
There was a problem hiding this comment.
Can we add a header to the output that specifies the codec used? Seems
brittle to require the configuration to be correct for simply reading the
checkpoint. (Not that it's an unreasonable assumption in >90% of cases, but
robustness is nice.)
On Wed, Jul 31, 2013 at 8:52 AM, Matei Zaharia notifications@github.comwrote:
In streaming/src/main/scala/spark/streaming/Checkpoint.scala:
@@ -49,6 +50,11 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time)
}
}+private[streaming]
+object Checkpoint {Hmm, I don't really get it. The reader is going to be a different instance
of the program anyway, after a failure. It will have to be configured with
the same codec class either way.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/754/files#r5507181
.
|
Ok I removed the singleton and squashed the last two commits. Jenkins, test this please. |
|
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/400/ |
…addendum Author: witgo <witgo@qq.com> Closes mesos#754 from witgo/commons-lang and squashes the following commits: 3ebab31 [witgo] merge master f3b8fa2 [witgo] merge master 2083fae [witgo] repeat definition 5599cdb [witgo] multiple version of sbt dependency c1b66a1 [witgo] fix different versions of commons-lang dependency
This is based on and subsumes @lyogavin's pull request #685.