Aug 17, 2009

Adding Identity column as Primary key to Existing Table

--Create a table as like this:
Create TABLE DataTable(id int, name varchar(20) CONSTRAINT pk_id PRIMARY KEY (id))
--now insert some values for this
Insert into DataTable Values(4, 'test 1')
Insert into DataTable Values(5, 'test 2')
Insert into DataTable Values(6, 'test 3')

--Drop the existing key
ALTER TABLE DataTable DROP CONSTRAINT pk_id

--adding pid as identiry column
ALTER TABLE DataTable add pid int identity(1,1)

--adding the new as primary key
ALTER TABLE DataTable ADD CONSTRAINT pk_DataTable PRIMARY KEY (pid)

OTN Discussion Forums : DateADD in oracle - How to add the date ...

OTN Discussion Forums : DateADD in oracle - How to add the date ...: "select sysdate, sysdate - interval '2' DAY from dual;"

Aug 10, 2009

Taking a look at CROSS APPLY

Taking a look at CROSS APPLY
Applying a Sub-Query, Joining a Derived Table ...

I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table. Let's see if I can explain that .... A derived table is 'self-contained', in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced). For example, consider:

select A.*, b.X
from A
cross join (select B.X from B where B.Val=A.Val) b

That is not legal because A.Val is out of scope within the derived table; this is because the derived table is evaluated independently of the other tables in the SELECT. To limit the rows in table B so that B.Val = A.Val, we must do that outside of the derived table via a join or in the criteria:

select A.*, b.X
from A
cross join (select * from B) b
where A.Val = b.Val

(Of course, the above is equivalent to doing an INNER JOIN to the derived table, or just joining to the table B.)

Also, keep in mind that the scope-of-derived-tables rule isn't just for CROSS JOINS, it's for all JOINS -- CROSS, INNER, OUTER and even UNION; they all use 'self-contained' derived tables.

This is in contrast to a correlated sub-query, where the parent SELECT is in scope for the sub-query; the sub-query is evaluated for each row in the query, so the other tables and columns in the SELECT are all available:

select A.*, (select B.X from B where B.Val=A.Val) as X
from A

(Note: I am ignoring for now the fact that returning multiple rows in a sub-query will return an error.)

This is an easy way to think of the difference between CROSS JOIN and CROSS APPLY. CROSS JOIN, as we saw, joins to a derived table; however, CROSS APPLY, despite looking like a JOIN, actually is applying a correlated sub-query. This imposes both the advantages of a correlated sub-query but also the performance implications.

So, we can simply rewrite our first example using CROSS APPLY like this:

select A.*, b.X
from A
cross apply (select B.X from B where B.Val=A.Val) b

Since we are performing an APPLY and not a JOIN, A.Val is in scope and it works just fine.

Table Valued User Defined Functions

Note that the same rules apply when using Table-Valued User-Defined Functions:

select A.*, B.X
from A
cross join dbo.UDF(A.Val) B

is not legal; once again, A.Val is not in scope for the user-defined function. The best we can do before SQL 2005 was to use a correlated sub-query:

select A.*, (select X from dbo.UDF(A.Val)) X
from A

However, that is not logically equivalent; the UDF cannot return more than 1 row or it will result in an error, and wouldn't make logical sense anyway if it did.

Starting with SQL 2005, we can now use CROSS APPLY and it will work fine:

select A.*, b.X
from A
cross apply dbo.UDF(A.Val) b

So, that is one way to think of the difference between a JOIN and an APPLY; a JOIN combines two separate result sets, but APPLY is more of a loop that evaluates one result set over and over for each row in another. This means that, in general, APPLY will be less efficient than a JOIN, just as, in general, correlated sub-queries are less efficient than derived tables. (The optimizer, however, is generally quite good at optimized correlated sub-queries when possible.)

So, why use CROSS APPLY instead of a correlated sub-query? What's the advantage? Actually, quite a lot -- it is much more powerful!

CROSS APPLY can return multiple rows

Unlike correlate sub-queries, CROSS APPLY works with multiple rows. This allows us to do things like 'joining' a table to a function that parses a CSV column in that table into multiple rows:

select A.ID, b.Val
from A
cross apply dbo.ParseCSV(A.CSV) b

When the ParseCSV() function returns multiple rows, it simply acts as if we have joined Table A to the function's return table, duplicating the rows in Table A for each row in the joined table. This is not possible with a correlated sub-query, and will result in an error. This is a quick and easy way to parse a table of data into multiple rows in an efficient 'set-based' manner when the algorithm requires a complex User-Defined Function. (The CSV is not a great example, but other parsing routines are not as easily accomplished via a JOIN to a numbers table.)

CROSS APPLY can return multiple columns

Again, in a correlated sub-query, we can only return a single value. If we write a SQL statement that returns a running sum, we can use a correlated sub-query like this:

select o.*,
(select sum(Amount) from Order o
where p.OrderDate <= o.OrderDate) as RunningSum
from Order o

However, what if we'd like to return an additional running sum of Orders based on some other criteria (e.g., for orders with the same 'OrderCode')? We'd need another correlated sub-query, greatly reducing the efficiency of our SELECT:

select o.*,
(select sum(Amount) from Order o
where p.OrderDate <= o.OrderDate) as RunningSum,
(select sum(Amount) from Order o
where p.OrderCode = o.OrderCode and p.OrderDate <= o.OrderDate) as SameCode
from Order o

However, we can easily re-write that using a single CROSS APPLY:

select o.*, rs.RunningSum, rs.SameCode
from Order o
cross apply
(
select
sum(Amount) as RunningSum,
sum(case when p.OrderCode = o.OrderCode then Amount else 0 end) as SameCode
from Order P
where P.OrderDate <= O.OrderDate
) rs

So, we get the benefit of return multiple columns like a derived table, and we also get the ability to reference outer values in our SELECT to use in criteria and CASE statements. Very, very powerful.

CROSS APPLY also allows us to quickly get columns from the 'previous' row in a table quite easily:

select o.*, prev.*
from Order o
cross apply
(
select top 1 *
from Order P where P.OrderDate < O.OrderDate
order by OrderDate DESC
) prev

We can express the running total and counts as CROSS JOINS or INNER JOINS by reworking a correlated sub-query, but using TOP and referencing the outer SELECT is not easily done in that manner.

Note that the above CROSS APPLY will not return any orders without a previous order; we would use OUTER APPLY to ensure that all orders are returned even if no previous orders exist:

select o.*, prev.*
from Order o
outer apply
(
select top 1 *
from Order P where P.OrderDate < O.OrderDate
order by OrderDate DESC
) prev

We can also CROSS APPLY to a Table-Valued User Defined Function that returns exactly one row, but with multiple columns, to return separate pieces of data from a single function call. For example, we can parse an email address into separate username and domain columns. A few months back I wrote a SQL Team article that discusses that concept.

Summary

CROSS and OUTER APPLY are very powerful and can be very useful, but we must be careful to use them only when necessary; I am still testing the possibilities, but in general an APPLY will tend to be not as efficient as a JOIN. In addition, APPLY is mostly demonstrated by applying table-valued user-defined functions, but it can be used with in-line SELECT statements as well.

original
Taking a look at CROSS APPLY:

A Story of the Deleted Transaction Log - SQLServerCentral

A Story of the Deleted Transaction Log

I was lying in bed one Saturday night and my phone rang. It was one of our team leads on the other end. The conversation began something like this. 'Hey, this client ran out of disk space so we deleted the transaction log for the database as it was over 100GB'. Well I wasn't quite asleep but I certainly wasn't awake and my initial thought was this guy thinks he is having a little fun with me. Before my mind could fully envision the animated balled up pieces of paper dematerializing as they were thrown towards the recycle bin I heard the funniest thing of the night come out of the leads mouth, 'Do you think that is related to the application not functioning?'

After rubbing my eyes and placing the jack back under the bed that I often use to get my jaw back into place, I asked if the LDF file was in the Recycle Bin. I am sure you know that the answer to that question was 'No', mainly because the file is too big for most bins and secondly because nothing is that easy. Obviously the next question was 'When was the last backup?' Wouldn't you know there wasn't one, go figure!

To fill you in on a little detail, the company I work for develops software. Our support model includes support of the applications, and we often get involved with database issues, but we do not own or maintain databases. The client had called the support line and stated they were out of space and asked if we could help them. This particular client does not have a DBA on staff, so our technicians were trying to help them. Certainly the wrong route was taken but at this point it was too late to look back. I am the sole DBA on the support side of the company, and since this was now a DBA issue that was caused by our technicians, I had to come up with a solution. The solution I came up with was the best for this particular issue based upon all the factors, specifically no backups.

After getting over the ridiculousness of the issue I began researching how to resolve the problem. I ran into a few hints in BOL and then other hints on the Internet. After all the research, I determined the best issue would be to use the DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option. BOL stated this option could repair a transaction log. Well what could be more broken about a transaction log than not having one at all?

Once I determined I would go down this path the first problem was that the db would not go online, not even in emergency mode. So what I did was stop the SQL Server service and then I renamed the MDF file. Let's say the database name was DbLive and I renamed the MDF from DbLive.mdf to DbBad.mdf. I then started the SQL Server service and created a new DbLive db with a corresponding DbLive_log.ldf file. I then stopped the SQL service again and deleted the DbLive.mdf file. I then renamed DbBad.mdf back to DbLive.mdf. I then started the SQL service again.

You can imagine that these steps didn't fix the problem. The db did not come on line, but at least the db now had a transaction log, albeit one from a different database. But as far as SQL server was concerned the log was there now. It was just in a jacked-up state (yes that is a technical term).

The next thing I did was run ALTER DATABASE DbLive SET EMERGENCY. This put the db in emergency mode and allowed SQL server to work with the db at an emergency level. Next I ran 'sp_dboption 'DbLive', 'single user', 'true' '. This set the db to single user mode preparing it for the DBCC that I would run next. Finally I ran 'DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)'. After about an hour the db was ONLINE!

I took the db out of single user mode by running 'sp_dboption 'DbLive', 'single user', 'false'' I then ran a few queries against the database and all was fine. I am not totally sure what data loss there was, if any. Since the MDF was intact and the SQL service was taken off line and then the LDF file was deleted all data should have been committed to the MDF. So I believe there was no data loss. This happened several months ago and there has not been any reported db or application anomalies so I must assume all data was present.

Oh and yes, I slapped a maintenance plan on that system and sent the backups to an obscure folder. If they run out of space again it will probably be due to having too many backups! What I meant to say was that I explained to the client how important backups are.


Script:

/*
Stop SQL service, rename DbLive.mdf to DbBad.mdf.
started SQL service, created fake DbLive db (with log etc)
Stopped SQL service
Deleted DbLIve.mdf
Renamed DbBad.MDF to DbLive.MDF
Started SQL service.
Ran following script:
*/
ALTER DATABASE DbLive SET EMERGENCY
sp_dboption 'DbLive', 'single user', 'true'
DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)
sp_dboption 'DbLive', 'single user', 'false'"

original Link
A Story of the Deleted Transaction Log - SQLServerCentral:
http://www.sqlservercentral.com/articles/Disaster+Recovery/63311/

Aug 4, 2009

Using CROSS APPLY in SQL Server 2005 - SQLTeam.com

Using CROSS APPLY in SQL Server 2005


My interest in writing this article was started by an MSDN article titled SQL Server 2005: The CLR Enters the Relational Stage. The article shows how to write a function that returns the top three countries per category. That's always been something that was difficult to do in SQL so I was curious about the approach. The article started out well but I was very unhappy by the end. It's just soooo much easier to do this in SQL Server 2005 using the new CROSS APPLY clause in Transact-SQL. So I'm going to write a query to return the top 3 orders for each customer and I'm going to do it in about 10 lines of SQL. (UPDATE: An alert reader found an even better approach!)

First please take a second to read the MSDN article. Pay special attention to how much C# code is required for the solution and how flexible it is.

Note: I also want to mention here that I started out trying to replicate their solution. I tried to install the MSI but that failed with an invalid characters message. I tried to recreate the CLR code but since he mostly posted snippets and had three versions I finally gave up trying to get that to work. One thing I really wanted to do was compare performance on identical data. Unfortunately I wasn't able to do that. My example will use the AdventureWorks database

Let's start at the TOP

Along the way to making this solution work we'll discuss a few new features of SQL Server 2005. The first is that the TOP clause can now take a variable:

DECLARE @Rows INT
SET @Rows = 10

SELECT TOP ( @Rows ) *
FROM Sales.SalesOrderHeader

This will return the top 10 rows from SalesOrderHeader. You can also replace @Rows with anything that evaluates to a number. The following query looks odd but runs just fine:

SELECT TOP (
SELECT COUNT(*)
FROM Sales.Customer
) *
FROM Sales.SalesOrderHeader

There are 19,185 rows in the Customer table and this query returns the top 19,185 rows from SalesOrderHeader. You can also use the TOP clause for INSERT, UPDATE and DELETE statements. If you wanted to DELETE in batches of 500 you can now do that using the TOP clause.

My Function

Next we need a function to return the TOP X rows from SalesOrderHeader based on the total sales amount. That function looks like this:

CREATE FUNCTION dbo.fn_GetTopOrders(@custid AS int, @n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(@n) *
FROM Sales.SalesOrderHeader
WHERE CustomerID = @custid
ORDER BY TotalDue DESC
GO

Notice that it accepts the number of orders to return as a parameter. Also notice that I'm using SELECT * all over the place. I really encourage you to explicitly list out the columns when you write this for real. If you call this with a CustomerID and a number of rows it will return that many rows ordered by the total amount of the order in descending order. Also notice that there is an ORDER BY clause in this function. We'll talk about that in a minute.

Apply the APPLY Clause

The real magic happens when you use SQL Server 2005's new APPLY clause. The APPLY clause let's you join a table to a table-valued-function. That let's you write a query like this:

SELECT  C.CustomerID,
O.SalesOrderID,
O.TotalDue
FROM
AdventureWorks.Sales.Customer AS C
CROSS APPLY
AdventureWorks.dbo.fn_GetTopOrders(C.CustomerID, 3) AS O
ORDER BY
CustomerID ASC, TotalDue DESC

which results in this...

CustomerID  SalesOrderID TotalDue
----------- ------------ ---------------------
1 45283 37643.1378
1 46042 34722.9906
1 44501 26128.8674
2 46976 10184.0774
2 47997 5469.5941
2 57044 4537.8484
3 53616 92196.9738
3 47439 78578.9054
3 48378 56574.3871
4 47658 132199.8023
. . .

The APPLY clause acts like a JOIN without the ON clause comes in two flavors: CROSS and OUTER. The OUTER APPLY clause returns all the rows on the left side (Customers) whether they return any rows in the table-valued-function or not. The columns that the table-valued-function returns are null if no rows are returned. The CROSS APPLY only returns rows from the left side (Customers) if the table-valued-function returns rows.

Notice that I'm just passing in the CustomerID to the function. It returns the TOP 3 rows based on the amount of the order. Since I'm using CROSS APPLY a customer without orders won't appear in the list. I can also pass in a number other than 3 to easily return a different number of orders per customer. Even better I can pass in a different number of orders for each customer. So I could list the top 5 orders for one type of customer but the top 10 for another type of customer. How cool is that?!?

And it gets even better. Remember the function has an ORDER BY in it. It will always return the top orders based on the amount. However you can change the ORDER BY clause in the query that calls the function to display those rows in whatever order you want. You could easily display the top 3 orders in ascending order instead.

Performance and Conclusion

Unfortunately I wasn't able to compare the performance of my solution to MSDN's. The query plan from my solution doesn't look that bad. (And did I mention that it showed me a full query plan for the query with the UDF rather than just a UDF step? Sweet!) If anyone gets their solution running send me the code and I'll update the article.

I also think their solution returns the result as a comma separated value. Now that would be a great use for the CLR. You could easily wrap that around the results of this function for a fast, effecient procedure. And it happens to be the next article I'm working on. :) If you want to see a great article on using the CLR inside SQL Server I'd start with Using CLR Integration in SQL Server 2005. The code is from an earlier beta but it gives you a great idea of when to use CLR code.

In summary I think that 5-10 lines of SQL is much simpler than 50+ lines of C# (or VB) for this type of solution. The article provides a lot of good information on the CLR inside SQL Server. I just don't think this is the best place to use it. One of my biggest fears is that people will use the CLR incorrectly and cause performance problems. This is the first real case I've seen of this. Is this really how Microsoft wants us to use the CLR?

UPDATE

I was curious when I posted this how long it would take to find an even better solution. It didn't take long. Arnold Fribble posted it in the forums. The syntax needed a little tweaking but it appears you can do this in a single query. And how did I forget the PARTITION BY clause of the ROW_NUMBER function. Doh! Here's the cleaned up query:

SELECT
CustomerID,
SalesOrderID,
TotalDue
FROM (
SELECT
O.CustomerID,
O.SalesOrderID,
O.TotalDue,
ROW_NUMBER = ROW_NUMBER() OVER (
PARTITION BY O.CustomerID
ORDER BY O.TotalDue DESC)
FROM
Sales.SalesOrderHeader AS O
) AS d
WHERE
d.ROW_NUMBER <= 3

You have to use a derived table (or a Common Table Expression) since the row number function can't be used in the WHERE clause of the query where it's defined. Now I'll have to write an article on the new ranking functions. :)


Using CROSS APPLY in SQL Server 2005 -
Publish Post
SQLTeam.com