com.github

marklister

package marklister

Visibility
  1. Public
  2. All

Value Members

  1. package collections

    collections.CollSeq is an IndexedSeq[Product] that also implements Product itself.

    A strongly typed tabular data framework.

    collections.CollSeq is an IndexedSeq[Product] that also implements Product itself.

    Specialized versions of CollSeq exist for arities 1 to 22. Each is an IndexedSeq[ProductN] and also implements ProductN

    In action
    import com.github.marklister.collections.io._
     import com.github.marklister.collections._
     Welcome to Scala version 2.10.1 (OpenJDK Server VM, Java 1.7.0_21).
     Type in expressions to have them evaluated.
     Type :help for more information.
    
     scala> CollSeq(("Jan",10,20),("Feb",33,44),("Mar",77,33))
     res0: com.github.marklister.collections.immutable.CollSeq3[String,Int,Int] =
     CollSeq((Jan,10,20),
             (Feb,33,44),
             (Mar,77,33))
    
     scala> //Extract column one
    
     scala> res0._1
     res1: Seq[String] = List(Jan, Feb, Mar)
    
     scala> //Join Column one and column 3 as a new collection:
    
     scala> res0._1 flatZip res0._3
     res2: com.github.marklister.collections.immutable.CollSeq2[String,Int] =
     CollSeq((Jan,20),
             (Feb,44),
             (Mar,33))
    I/O

    io.CsvParser is a very easy way to read CollSeqs or Tuples from the File System.

    You use the factory to select a parser:

    val parser= CsvParser[String, Int, Double]

    and read your file like this:

    val data= parser.parseFile("example.csv")

    You wind up with CollSeq3[String,Int,Double]

    Positioning

    product-collections aims to be simple and productive: you should be producing answers from your data in 20 minutes or less. There is no new api to learn -- everything works like a scala collection and a Tuple at the same time. There's no matrix arithmetic: do everything in idomatic scala.

    Columns don't lose their type if you include a column of another type. ' Learn by example: take a look (or clone) the simple example project on Gitub that does some simple processing of stock prices.

    Alternatives
    Saddle

    A heavy duty solution. Custom api based around Vectors Matrixes and Scalars. Trying to mix types in a Saddle matrix results in a Matrix[Any] which means not much type safety. Saddle seems to have garnered some support from Typesafe and may feature in GSOC.

    Saddle has heavy emphasis on specialization and (presumably) performance.

    Breeze

    Breeze also has matrix and vector implementations similar to Saddle. Also some other stuff that looks pretty useful.

    Framian

    Under heavy development this looks interesting. It does look more complicated than p-c. One specifies the return type at retrieval time.

Ungrouped