Otiose Banter: SQL Server 2008

Showing posts with label SQL Server 2008. Show all posts

Wednesday, September 15, 2010

Compress large files on windows server 2008

This started out as something else. We had a large database of about 100 GB that we wanted shift to our DR location. Due to bandwidth restrictions we started out to explore options on how we could reduce the overhead of transferring such a large file.

Now compressing a 100 GB file is not a joke. After some looking up I came across an interesting article by Chris over at solo-technology talking about Compression v/s Speed. It talked about a related issue but the same capability could be used for handling large files.

The best tool for doing this is 7Zip. It is an awesome freeware by Igor Pavlov handling a large number of compression formats. Check it out over here.

http://www.7-zip.org/

That aside, by using the command line version of this you could zip you file. When we started the zip file took 5 hours to finish just 30 % of compression. On check the task manager, I found that the whole process was just running at 100 % on a single thread of the 24 cores available on the server.

By using the following options,

I was able to compress the 100 GB SQL Server 2008 database backup file to just 16 GB ! in just 1 Hour !!!

This was way beyond my expectations :) Here is that technical syntax to use this

7z a -tbzip2 -mx=9 -mmt=on backup.zip backup.bak

"7z" is the program to initiate
"a" is to append the file
"-tbzip2" is to use the bzip2 format, this is important as only this format allows for multithreading for both the compression as well as the decompression
"-mx=9" is the number of passes, this is optional, I didn't use it, the system defaults this to 5
"-mmt=on" is to turn on the multithread capability, again this works only for the bzip2 format
"backup.zip" is the file that will get created
"backup.bak" is the file that needs to get backed up.

You could read through the technical literature available with 7zip to get a better hang of the application. This were just the specific options that are required to manage this requirement.

One other note, I did read the the bzip2 format has a restriction that it can include only 1 file at a time. I didn't try for multiple files.

Hope this helps in saving you precious bandwidth !!!

Friday, April 2, 2010

Microsoft Association Rules / Data mining in SQL Server 2008

Since the release of SQL Server 2005 Microsoft has dabbled quite a bit with Data mining models. While I have read that Data mining capability has been available with one of the later releases of SQL Server 2000, I haven't been able to come across too much material on the same.

First up I would like say the best place to start if you are serious about learning about Data mining is to go get yourself this really good book on the topic. Its extremely easy to read and the examples used are quite practical.

The book is

Data Mining with SQL Server® 2008

by
Jamie MacLennan
ZhaoHui Tang
Bogdan Crivat

Back to original reason of the post, being a techno functional individual, I was keen to see how we could effectively cross sell products to stores since we had their complete sales history. Ensuring that our consumers never left the store without the products they wanted goes a long way in ensuring consumer loyalty and company performance.

Having most of the data already available in SQL Server 2008 I was keen to see what we could use to help identify these gaps. In SQL Server 2008 Microsoft has the following Data Mining Models

Microsoft Naıve Bayes
Decision Trees Algorithm
Time Series Algorithm
Microsoft Clustering
Sequence Clustering
Association Rules
Neural Network and Logistic Regression

The best fit for the requirement I had was the Microsoft Association Rules. It allowed me to build the model utilizing the classification of outlets that we already had.

I built the model using three classifications we had for our outlets along with products that the outlet already sold. In order to do this I needed to create a nested table for the products as one outlet could sell multiple products. Also important to note that the data types for Data mining objects are slightly different from the data types that are used by standard SQL queries. The DMX I used is below.

Post creating the structure , I applied the Microsoft Association Rules Algorithm requested the prediction of the product utilizing the standard support and probability the model is generally used with.

The next step was to feed in data into the model to let it build the association rules. This was done using the following syntax. Am important point to note here is that the key data needs to be sorted. Sometimes in spite of sorting it still gives errors saying the data most probably is not sorted. The solution for this is to ensure that the key that is used is specifically a nvarchar source data type. I faced a problem when my key happened to have numeric data as well as string data. Needed to convert everything before I could successfully load the data into the mining structure.

Once the data is loaded into the model, the last step is to retrieve out the recommendations that the model would generate. I have done this for the store ABC by passing the store along with the products that the store purchased into the model and asking it to return 5 recommendations based on the model as well as the classification of the store and the products that the store sells.
This is done through the following DMX statement.

In case you want this data in the standard query windows you could always create a Linked server to OLAP where the Data Mining model exists.

Do let me know if you have any comments !

Friday, October 2, 2009

SQL Server 2008 Memory Leak / windows server 2008 high memory utilization

Having working with Cubes on SQL Server 2000, I had grown used to the fact that SQL will use and release memory depending on its need and I would be able to track the about of memory each program is utlizing through the task manager tab.

However its important to note the following. With windows server 2008 and SQL Server 2008 ( probably SQL Server 2005) the above mentioned assumptions are completely wrong. SQL Server 2008 will utilize as much memory that is available on the box, in my case 64 GB if any job works with high volumes of data but will NEVER EVER release this RAM back to the system once the job completes.

In short, the memory get reserved for SQL Server 2008 till the service / Server gets restarted. In case SQL needs to run some other job that requires data to be loaded, it internally will flush out the older data and load what it needs and yes even then nothing gets released back to the server.

Also to be noted is that in the task manager, none of this memory is ever shown against any of the processes that are running. Even if you use process explorer (sysinternals) this memory does not show up against SQL Server 2008 there as well.

Sadly this leads people ( including me) to come to the conclusion that there is a memory leak in SQL Server 2008. To convince yourself, just limit the amount of memory SQL Server can use and then check out the memory utilization in task manager. It will be that amount plus whatever is required to run your server.

There is article that describes this much more technical detail here ...

http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/08/24/troubleshooting-the-sql-server-memory-leak-or-understanding-sql-server-memory-usage.aspx

Its a good idea to prevent SQL from utilizing the entire memory else the server can become unstable.

P.S. There is no memory leak :)

Saturday, August 22, 2009

Excel 2007 OLAP Writeback SSAS 2008

We wanted to build an application based on the Writeback capability of Microsoft. The dream was to have a cube with multiple dimension allowing our users to analyze data by slicing / dicing this cube and based on their analysis make changes to the numbers at various levels.

Once they were happy with the data post changes we were to flow this data to a downstream system. It was to be a highly interactive system for a reasonably competent user to plan target setting by carrying out what if analysis on the existing data.

The front end for the user was Excel 2007 as the navigation / user interface was phenomenon and all our users in our company were fairly competent in Excel.

The path we started down was to have SQL Server Analysis Services 2005 as the backend sitting on a Hexacore 4 processor Windows Server 2008 box. We then enabled the cube for Writeback. The way this works according to me at a high level is that whatever numbers you choose to writeback, it does not make the change directly to your cube, what it does however is to create a writeback partitition that stores the net off data so that you could see the required number on your front end.

Let me explain that a little more with an example, lets say for one depot for one product the value initially was 100. As part of the writeback lets say you want it to be 70. What actually happens is that an entry takes place in the writeback partition for that depot and that product as -30. Based on a join of these two the number dispayed is 70.

While this works perfectly for small sets of data, it fails miserably for larger sets. The solution becomes impractical and very sluggish if the writeback is at an aggregate level and affecting large number of leafs / touples. Post engaging with Microsoft Consulting we confirmed that this was not going to work out for the application we had in mind.

OLAP Writeback sucks, period !!! and even Microsoft didn't have a response :) Now just because the writeback using the writeback partitition is hopeless it does not mean that its the only way to writeback user inputs into the cube.

The path we finally deployed our solution was a combination of the following.
VBA ( Macro coding) / Web services / SQL Server 2008 Integration services and SQL Server 2008 Analysis Services.

The excel that was shared with the users had a custom built macro code that essentially looped through the page / row / column fields to gather the user selections. Care had to taken to be able to loop through both the dimension member as well as the attributes. ( This is feature of SQL Server 2005 + , do read up if you are unaware) Also note it gets more complicated in Pivot 12 as there is extended capability in terms of Value and Label filters, both of which needs to be captured as well incase someone has applied them.

Once all the filters are looped and the details captured, the macro uses this data to frame a create sub cube query that gets passed to the server using a web service. This sub cube that gets created on the fly contains the entire data from the main cube that needs to undergo the writeback / change.

The next step is to strip this newly created sub cube to retrieve the underlying rowset of data that makes up the cube. Once we have that into a table, we have the option of applying pretty much any rule on this data, disaggregation / static values / percentage of existing value / percentage of another value.

Post applying the rule, the data gets updated into the base cube data and an incremental process is carried out and voila you have implemented your writeback.

Just to share some of the numbers, we were able to carry out a writeback affecting 3 Lakh records ( Thats three hundred thousand records :) in India) in a little over 2 minutes with the majority of the time in the incremental processing. This ofcourse will depend on the size and complexity of your own cube.

If your curious about the rest of the design, well since we had the changes in the table we could build an SSIS package to incrementally flow this information down stream into a DB2 environment using OLE/DB2 providers which ofcourse has its own problems in life :)

Tuesday, May 12, 2009

IBM OLE DB provider for DB2 sql server 2008 / DB2 Administration Client Windows Server 2008

While setting up our production environment we ended up wasting a day as we couldn't create a linked server using IBM OLE DB provider for DB2 as it wasn't showing up in the options. We apparently were trying to install the 32 bit client of the server while we needed the x64 version. After figuring that out we began a hunt to download the required file which in reality was quite painful as it wasn't clearly mentioned anywhere.

In order to save others the trouble, the driver comes along with the installation of the "DB2 Administration Client " as part of the 8.2 versions and as "IBM Data Server Client " in the 9.5 versions. The links in the document are valid for Windows Server 2008 utilizing x64 architecture. If your on 32 bit , you can navigate to the right download from the given pages itself.

The page for the current version of DB2 v9.5 is as follows, make sure you download the "IBM Data Server Client" file. It is part of the fix pack 3b which is latest at the time of this article

http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21288113

The direct file

ftp://ftp.software.ibm.com/ps/products/db2/fixes2/english-us/db2winX64v95/fixpack/FP3b_WR21447/v9.5fp3b_ntx64_client.exe

In case you are using / want the DB2 v8.2 client, make sure you download the "DB2 Administration Client" file. It is part of the fix pack 17a which is latest at the time of this article

http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21256105

The direct file

ftp://ftp.software.ibm.com/ps/products/db2/fixes2/english-us/db2winAMD64v8/fixpak/FP17a_WR21442/FP17a_WR21442_ADMCL.exe

Please note that there are different versions for different architectures x32 / x64 / IA64. Download the right file for yourself.

Additionally if you are looking for the Microsoft OLEDB Provider for DB2, it is available as part of the Microsoft SQL Server 2008 Feature Pack, October 2008 located here

http://www.microsoft.com/downloads/details.aspx?FamilyId=228DE03F-3B5A-428A-923F-58A033D316E1&displaylang=en

Do let me know if this has been helpful. Cheers !

Sunday, September 21, 2008

Dot .Net 3.5 SP1 Installation Error

The other day for one of our projects we needed to upgrade to SQL Server 2008. Part of the pre requisites was to have .Net framework 3.5 installed on the machine. The DVD had the install for the framework but it just wouldn't install.

The setup would unarchive all the required files, and fire the setup program. The progress bar would increase all the way upto 95 - 98 % and just hang. I am talking about the full package and not the boot strapper. Tried after a couple of re boots but exactly the same problem on a couple of my machines, all running XP Pro SP2.

Did some readups and a few people have had some install problems. The solution for this kind of problem is to disable all your network adapter. Go to the Network Connection menu, right click the connections and say disable. Once this is done, run the install for the full package, it should go through rather quickly with the Microsoft EULA popping up as soon the setup progress finishes.

Don't know why this happens, am assuming that the package still tries to verify if all parts are available for install and wants to connect to the net but is unable to do so through our proxy servers.

Either way, this workround works well. You would need to install the framework for SQL Server 2008, Visual Studio 2008 and going forward probably a lot more applications that get developed on the microsoft platform.

The microsoft page is here
http://www.microsoft.com/downloads/details.aspx?familyid=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en

The full package install is here
http://download.microsoft.com/download/2/0/e/20e90413-712f-438c-988e-fdaa79a8ac3d/dotnetfx35.exe

Hope this helps, do drop me a comment would love to know.

Otiose Banter