collections.CollSeq is an IndexedSeq[Product] that also implements
Product itself.
A strongly typed tabular data framework.
collections.CollSeq is an IndexedSeq[Product] that also implements
Product itself.
Specialized versions of CollSeq exist for arities 1 to 22. Each is an
IndexedSeq[ProductN] and also implements ProductN
In action
import com.github.marklister.collections.io._
import com.github.marklister.collections._
Welcome to Scala version 2.10.1 (OpenJDK Server VM, Java 1.7.0_21).
Type in expressions to have them evaluated.
Type :help for more information.
scala> CollSeq(("Jan",10,20),("Feb",33,44),("Mar",77,33))
res0: com.github.marklister.collections.immutable.CollSeq3[String,Int,Int] =
CollSeq((Jan,10,20),
(Feb,33,44),
(Mar,77,33))
scala> //Extract column one
scala> res0._1
res1: Seq[String] = List(Jan, Feb, Mar)
scala> //Join Column one and column 3 as a new collection:
scala> res0._1 flatZip res0._3
res2: com.github.marklister.collections.immutable.CollSeq2[String,Int] =
CollSeq((Jan,20),
(Feb,44),
(Mar,33))
I/O
io.CsvParser is a very easy way to read CollSeqs or Tuples from the File System.
You use the factory to select a parser:
val parser= CsvParser[String, Int, Double]
and read your file like this:
val data= parser.parseFile("example.csv")
You wind up with CollSeq3[String,Int,Double]
Positioning
product-collections aims to be simple and productive: you should be
producing answers from your data in 20 minutes or less. There is no
new api to learn -- everything works like a scala collection and a Tuple
at the same time. There's no matrix arithmetic: do everything in idomatic
scala.
Columns don't lose their type if you include a column of another type.
'
Learn by example: take a look (or clone) the
simple example project on Gitub that does some simple processing of
stock prices.
Alternatives
Saddle
A heavy duty solution. Custom api based around Vectors
Matrixes and Scalars. Trying to mix types in a Saddle matrix results in
a Matrix[Any] which means not much type safety. Saddle seems to
have garnered some support from Typesafe and may feature in GSOC.
Saddle has heavy emphasis on specialization and (presumably) performance.
Breeze
Breeze also has matrix and vector implementations similar to Saddle. Also
some other stuff that looks pretty useful.
Framian
Under heavy development this looks interesting. It does look more complicated
than p-c. One specifies the return type at retrieval time.
A strongly typed tabular data framework.
collections.CollSeq is an IndexedSeq[Product] that also implements Product itself.
Specialized versions of CollSeq exist for arities 1 to 22. Each is an IndexedSeq[ProductN] and also implements ProductN
In action
I/O
io.CsvParser is a very easy way to read CollSeqs or Tuples from the File System.
You use the factory to select a parser:
and read your file like this:
You wind up with CollSeq3[String,Int,Double]
Positioning
product-collections aims to be simple and productive: you should be producing answers from your data in 20 minutes or less. There is no new api to learn -- everything works like a scala collection and a Tuple at the same time. There's no matrix arithmetic: do everything in idomatic scala.
Columns don't lose their type if you include a column of another type. ' Learn by example: take a look (or clone) the simple example project on Gitub that does some simple processing of stock prices.
Alternatives
Saddle
A heavy duty solution. Custom api based around Vectors Matrixes and Scalars. Trying to mix types in a Saddle matrix results in a Matrix[Any] which means not much type safety. Saddle seems to have garnered some support from Typesafe and may feature in GSOC.
Saddle has heavy emphasis on specialization and (presumably) performance.
Breeze
Breeze also has matrix and vector implementations similar to Saddle. Also some other stuff that looks pretty useful.
Framian
Under heavy development this looks interesting. It does look more complicated than p-c. One specifies the return type at retrieval time.