Convert a Seq[(Numeric,Numeric)] to a WeightedStats object
Convert a Seq[Numeric] to a Stats object
Convert a Seq[Product1] to a CollSeq1
Convert a Seq[Product2] to a CollSeq2
Convert a Seq[Any] to a CollSeq
At the moment the only structure available is a CollSeq
At the moment the only structure available is a CollSeq
Generally you use the companion object to construct the appropriate CollSeq:
scala> CollSeq(("Jan",100,200),("Feb",120,230),("Mar",300,330)) res0: com.github.marklister.collections.immutable.CollSeq3[String,Int,Int] = CollSeq((Jan,100,200), (Feb,120,230), (Mar,300,330))
scala> res0._2 res1: Seq[Int] = List(100, 120, 300)
scala> res0._3.flatZip(res0._1).flatZip(res0._2) res3: com.github.marklister.collections.immutable.CollSeq3[Int,String,Int] = CollSeq((200,Jan,100), (230,Feb,120), (330,Mar,300))
Implicit conversions in package object com.github.marklister.collections make the following promotions available:
Implicit conversions in package object com.github.marklister.collections make the following promotions available:
- Seq[Numeric] => Stats[Numeric] - Seq[(Numeric,Numeric)] => WeightedStats[Numeric,Numeric]
The classes com.github.marklister.collections.util.Stats and com.github.marklister.collections.util.WeightedStats make methods like mean, variance, and stdDev available.
import com.github.marklister.collections.io._ import com.github.marklister.collections._ Welcome to Scala version 2.10.1 (OpenJDK Server VM, Java 1.7.0_21). Type in expressions to have them evaluated. Type :help for more information. scala> Seq(1,2,3).mean res0: Double = 2.0 scala> CollSeq((1,2),(2,1),(3,3)).mean res1: Double = 2.1666666666666665 scala> (2.0+2.0+9.0)/6.0 res2: Double = 2.1666666666666665
A strongly typed tabular data framework.
collections.CollSeq is an IndexedSeq[Product] that also implements Product itself.
Specialized versions of CollSeq exist for arities 1 to 22. Each is an IndexedSeq[ProductN] and also implements ProductN
In action
I/O
io.CsvParser is a very easy way to read CollSeqs or Tuples from the File System.
You use the factory to select a parser:
and read your file like this:
You wind up with CollSeq3[String,Int,Double]
Positioning
product-collections aims to be simple and productive: you should be producing answers from your data in 20 minutes or less. There is no new api to learn -- everything works like a scala collection and a Tuple at the same time. There's no matrix arithmetic: do everything in idomatic scala.
Columns don't lose their type if you include a column of another type. ' Learn by example: take a look (or clone) the simple example project on Gitub that does some simple processing of stock prices.
Alternatives
Saddle
A heavy duty solution. Custom api based around Vectors Matrixes and Scalars. Trying to mix types in a Saddle matrix results in a Matrix[Any] which means not much type safety. Saddle seems to have garnered some support from Typesafe and may feature in GSOC.
Saddle has heavy emphasis on specialization and (presumably) performance.
Breeze
Breeze also has matrix and vector implementations similar to Saddle. Also some other stuff that looks pretty useful.
Framian
Under heavy development this looks interesting. It does look more complicated than p-c. One specifies the return type at retrieval time.