Let's see what happens when you run the query against the larger data set in titles_big:
SELECT SUM(dbo.WordRepeatedNumTimes
(title, 'the'))
FROM titles_big
The query still runs, but it took 2 minutes and 23 seconds to run on my laptop. A simple SUM() of the price column returned in a few milliseconds. So, you can learn a few lessons from this example:
- UDFs let you extend T-SQL in many useful ways.
- You shouldn't perform an activity just because you can. T-SQL is a set-based language, and you must consider the performance implications of performing nonset-based activity such as this solution in the database.
- Usually you can find more than one way to solve a SQL problem. The solution I wrote works and solves the reader's problem. However, in this case, the reader is probably better off thinking outside the box and implementing full-text indexing.
You can also create a set-based solution for counting the number of times a word appears in a string. SQL Server Magazine contributing editor Itzik Ben-Gan suggested this clever and simple solution. The following query shows how many times the word the appears in the title column:
SELECT title, (LEN(title) - LEN(REPLACE(title, 'the', ')))/LEN('the')
FROM titles_big
And the next query sums the occurrences of the across the entire table:
SELECT SUM((LEN(title) - LEN(REPLACE(title, 'the', ')))/LEN('the'))
FROM titles_big
You simply replace all occurrences of the target word with an empty string, then compare the length of the string before and after the replacements. So if the original string is nine characters longer than the replacement string and the target word the is three characters long, you can see that you removed three occurrences of the target word. You can usually replace a row-by-row approach with set-based logic, as Ben-Gan's example shows.
End of Article
Prev. page
1
[2]
next page -->