GitHub does not let us to push large files (size of bigger than 100MB) to GIT repository. I have one jar, which is larger than 100 MB, therefore, you should copy that jar to you "lib" directory. There are at least three options (Option-1, Option-2, and Option-3) to get that jar file:
The first time you build by
ant
The spark-assembly-1.4.0-hadoop2.6.0.jar
will be copied/downloaded from the following URL directory:
http://www.mapreduce4hackers.com/dataalgorithmsbook/lib/
Subsequent builds will not copy the big jar file.
Sample log of Option-1 is provided.
You may download that jar file from the following URL and then copy it to the lib
directory:
- spark-assembly-1.4.0-hadoop2.6.0.jar
- Size of this jar is: 138,391,791 bytes
- This jar is built using Spark 1.4.0 against Hadoop 2.6.0 with JDK7
You may build this jar file from the Spark 1.4.0 version: this is how:
-
Download Spark 1.4.0 to your designated directory:
<your-install-dir>/spark-1.4.0.tgz
-
Open it it up by
cd <your-install-dir> tar zvfx spark-1.4.0.tgz cd spark-1.4.0 sbt/sbt -Dhadoop.version=2.6.0 -Pyarn assembly
-
Once your build is successful, then you will have your desired jar file as:
<your-install-dir>/spark-1.4.0/assembly/target/scala-2.10/spark-assembly-1.4.0-hadoop2.6.0.jar
-
Finally, copy
spark-assembly-1.4.0-hadoop2.6.0.jar
to the<your-installed-directory>/data-algorithms-book/lib/
directory.
Thanks,
best regards,
Mahmoud Parsian