Skip to content

Querying a cassandra DB via spark #38

@Enilia

Description

@Enilia

Hey there,

As the title says, i am trying to query an existing cassandra DB from nodejs using your library. I am using a spark cluster on a LAN

Here's what i have done so far :
using :

  • CentOS 7
  • node 4.4.4
  • apache-spark-node@0.3.3
  • spark 1.6.1
  • cassandra 2.2.5
  • spark-cassandra-connector 1.6.0-M1

From the root of my project :

ASSEMBLY_JAR=/usr/share/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar node_modules/apache-spark-node/bin/spark-node \
--master spark://192.168.1.101:7077 --conf spark.cores.max=4 \
--jars /root/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.6.0-M1-36-g220aa37.jar

Once i have access to the command line i tried to do

spark-node> sqlContext.sql("Select count(*) from mykeyspace.mytable")

but of course i get a

Error: Error creating class
org.apache.spark.sql.AnalysisException: Table not found: `mykeyspace`.`mytable`;

i then tried to adapt a snippet of scala i've seen on a stack overflow post

var df = sqlContext
  .read()
  .format("org.apache.spark.sql.cassandra")
  .option("table", "mytable")
  .option("keyspace", "mykeyspace")
  .load(null, function(err, res) { console.log(err); console.log(res) }) 

but all i get is a

Error: Error running instance method
java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org

The problem surely comes from the fact that i don't understand half of how everything is linked together, that's why i'm here asking for some help about this issue. All i need is a way to execute basic sql functions (with only WHERE clauses) over one cassandra table.

I recon this project seems no longer maintained, but this is as far as i can see the simpler solution i have seen so far (solutions like eclairJS have way more functionalities than i need, at the cost of an increased complexity and maybe less performance) and it would just fill my needs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions