xVelocity for Business Intelligence

xVelocity is in-memory columnar storage technology that powers the analytical engine of PowerPivot. Introduced as VertiPaq in the SQL Server 2008 R2 release, it was re-released for SQL Server 2012 in PowerPivot and SQL Server Analysis Services.
The xVelocity columnstore index feature is used to enable high performance query processing in SQL Server data warehouses and data marts. Some reports claim that the ColumnStore Index feature can produce performance boosts of 10-100x for some queries.
With a columnstore index, data is kept in column-wise fashion, with each column stored separately, rather than storing all columns of a row on the same page. In addition, xVelocity includes a new, vector-based query execution technology called “batch processing” that tremendously speeds up query processing in conjunction with the columnstore index.
Columnstore indexes are primarily designed for use in dimensional data warehouses and data marts, and are optimized to improve query performance from a star or similar join. Column store indexes can be used with partitioned tables, providing a new way to think about how to design and process large datasets.
Why use a column store?
SQL Server’s traditional indexes are based on the B-tree. B-trees are great for finding data that match a key. They’re also reasonably fast when you need to scan all the data in a table. So why use a column store? There are three main reasons: fetching only needed columns, compression, and batch mode processing.
Fetching Only Needed Columns
One key characteristic of columnar storage is the ability to read the values of a particular column of a table without having to read the values of all the other columns. With row-oriented storage, this is impossible because the individual column values are grouped in rows on the pages. Reading a page in order to read a column value involves fetching the entire page in memory, thus automatically reading all the other columns in the row.
Because the values from individual columns are separated into their own pages, the engine can read only the columns needed. This reduces the IO for queries that:
- Read a precise subset of columns of the table (no SELECT *)
- Read many rows (scans and range scans)
Thus, Columnstore indexes can provide many performance benefits for queries that perform aggregate functions across a wide variety of records, but for only a few columns.
Compression
Because column-oriented storage groups values from the same column together, there is a new beneficial side effect: data becomes more compressible. With column-oriented storage the data is more homogenous, as it contains only values from a single column. Since compression ratio is subject to how homogenous the data is, it follows that columnar storage format is more compressible than the same data represented in row-oriented storage format.
Since the columnstore format is targeted explicitly at BI workloads and large datasets, it is justifiable to invest significantly more development effort into enhancing compression benefits; deploy several alternative compression algorithms for the engine to choose from and leverage operations directly in the compressed format w/o decompressing the data first. Row-oriented storage has to strike a balance between compression benefits and its runtime cost in consideration of the typical OLTP workload.
Columnstores embrace compression fully and go to great lengths to achiev high compression ratio because in the expected BI workload the savings in IO from better compression more than offsets the runtime CPU loss.
Batch Mode Processing
A batch typically represents about 1000 rows of data. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput. Batch mode processing spreads metadata access costs and overhead over all the rows in a batch. Batch mode processing operates on compressed data when possible leading superior performance. Batch Mode processing is only available in a Columnstore index.
Demo:

In this demo, we are going to load the SalesOrderDetail Table from Adventureworks to demo how the ColumnStore Index works. First, we create a test table:
–Create the table
CREATE TABLE [dbo].[SalesOrderDetail_ColumnStore](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] [numeric](38, 6) NOT NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
– Create a clustered index
CREATE CLUSTERED INDEX [SalesOrderDetail_Clustered] ON [dbo].[SalesOrderDetail_Columnstore]
( [SalesOrderDetailID])
GO
The following statement should load about 21 million records.
–Insert a LOT of records into the table to simulate a fact table. This may take a while.
INSERT INTO [dbo].[SalesOrderDetail_Columnstore]
SELECT SalesOrderDetail.*
FROM AdventureWorks2012.Sales.SalesOrderDetail
GO 100
– Create ColumnStore Index
CREATE NONCLUSTERED COLUMNSTORE INDEX [ndx_SalesOrderDetail_ColumnStore]
ON [SalesOrderDetail_Columnstore]
(UnitPrice, OrderQty, ProductID)
GO
Now that a table has been loaded with enough data, some test queries can be executed to demo the speed comparison. First, we run an aggregation type of query, but we specify an index hint to use the Clustered (non-columnstore) index.
– Select Table with Clustered Index (Not Columnstore)
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[SalesOrderDetail_ColumnStore] WITH(INDEX([SalesOrderDetail_Clustered]))
GROUP BY ProductID
ORDER BY ProductID
GO

Here we look at Logical reads. When a block is requested by a query, SQL Server looks for the requested data in the memory buffer cache and – if found – it results in a Logical read. If the requested data was not found in the Buffer, then it must read the data from disk, resulting in a physical read. Physical I/O is the most expensive component and finding the block in the buffer will reduce most of the work.
The query above resulted in a fair amount of logical reads. Now let’s see how the columnstore index handled this query.
– Select Table with Columnstore Index
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[SalesOrderDetail_ColumnStore]
GROUP BY ProductID
ORDER BY ProductID

The query caused dramatically less read activity when using the ColumnStore index. Just for giggles, let’s change the indexing and replace the columnstore index with a standard nonclustered index on the same columns and review the performance difference.
Drop the columnstore index:
DROP INDEX ndx_SalesOrderDetail_ColumnStore
— Create nonClustered Index
CREATE NONCLUSTERED INDEX [ndx_SalesOrderDetail_ColumnStore]
ON [SalesOrderDetail_Columnstore2]
(UnitPrice, OrderQty, ProductID)
GO
Now re-execute the query.

The columnstore index made a huge difference in the amount of read activity. The reason for this drop in read activity is because with the column store index, far fewer pages must be read in order to satisfy the query.
A further problem lies here:
INSERT INTO [dbo].[SalesOrderDetail_Columnstore]
SELECT SalesOrderDetail.*
FROM AdventureWorks2012.Sales.SalesOrderDetail
GO 10
Beginning execution loop
Msg 35330, Level 15, State 1, Line 1
INSERT statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, then rebuilding the columnstore index after INSERT is complete.
** An error was encountered during execution of batch. Continuing
.
Msg 35330, Level 15, State 1, Line 1
INSERT statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, then rebuilding the columnstore index after INSERT is complete.
** An error was encountered during execution of batch. Continuing.
Msg 35330, Level 15, State 1, Line 1
The biggest columnstore drawback with the SQL Server 2012 release is its inability to update data. Once a columnstore index is added to a table, the table becomes read-only. The best way to circumvent this problem is to perform fast partition switch-in operations, which allow for partitioned columnstore indexes to be updated.
- To update a table with a columnstore index, drop the columnstore index, perform any required INSERT, DELETE, UPDATE, or MERGE operations, and then rebuild the columnstore index.
- Partition the table and switch partitions. For a bulk insert, insert data into a staging table, build a columnstore index on the staging table, and then switch the staging table into an empty partition. For other updates, switch a partition out of the main table into a staging table, disable or drop the columnstore index on the staging table, perform the update operations, rebuild or re-create the columnstore index on the staging table, and then switch the staging table back into the main table.
To learn more about business intelligence technology, please check out the resources available here.

