November 20, 2007 08:20 PM

Strim Strips Out Non-Alphanumeric Characters

Rating: (0)
SQL Server Magazine
InstantDoc ID #97392

Executive Summary:

Taking advantage of the fact that SQL Server "provides additional ordering guarantees beyond ANSI," the Strim function strips out non-alphanumeric characters (e.g., hyphen, exclamation point) from strings. This T-SQL user-defined function runs on SQL Server 2005, SQL Server 2000, and SQL Server 7.0.


SQL Server offers a variety of string functions, but it doesn’t have a function that strips out non-alphanumeric characters, such as a hyphen (-), an exclamation point (!), a percent sign (%), or a semicolon (;). I created a function named Strim, which is short for strip trim, to fill this gap.

The Strim function takes advantage of the fact that SQL Server “provides additional ordering guarantees beyond ANSI.” Specifically, a variable assignment in a query such as

SELECT @a = @a + col
FROM Table ORDER BY col2

will be performed in the expected order. For more information about these guarantees, see the “Ordering Guarantees in SQL Server 2005” entry in the Tips, Tricks, and Advice from the SQL Server Query Optimization Team blog (blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx). Note that the blog has since morphed into the Tips, Tricks, and Advice from the SQL Server Query Processing Team blog (blogs.msdn.com/sqlqueryprocessing/default.aspx).

Listing 2 shows the Strim function. This function takes one input parameter (@textin), which is limited to 8,000 characters. Strim first converts the string into a derived table labeled A. The derived table is cross-joined with a table of numbers named Tally to parse the string. Listing 3 shows the code that creates the Tally table. The WHERE clause at callout A in Listing 2 filters the intermediate output from the cross join. The clause uses ASCII codes to accept only numbers (ASCII codes 48 through 57), uppercase letters (ASCII codes 65 through 90), and lowercase letters (ASCII codes 97 through 122). The function then converts the intermediate output to a derived table labeled D. For example, if

@textin=”abcdefg”

table D’s rows would be:

i letter
1 a
2 b
3 c
4 d
5 e
6 f
7 g

Finally, the string is reconstituted as abcdefg, assigned to @textout, and returned.

For comparison, let’s look at what happens if a string contains non-alphanumeric characters. If

@textin=”pubs-dev”

table D’s rows would be:

i letter
1 p
2 u
3 b
4 s
5 d
6 e
7 v

and the string would be reconstituted as pubsdev.

I’ve used the Strim function to remove nonvisible characters in imported data (e.g., IBM EBCDIC data). Another use involves creating filenames for our database backup files. Our company uses Idera’s SQL Compliance Manager, which creates files with filenames such as SQLcompliance_MyDatabase.SQL.DOMAIN.COM. Our custom backup script uses the Strim function to strip out the unwanted characters in the database names when it creates the filenames for the backup files so that the filenames follow our standard format of myservername_ myinstancename_SQLcomplianceMyDatabaseSQLDOMAINCOM_ backuptype_yyyymmddhhmmss.bak. The Strim function works on SQL Server 2005, SQL Server 2000, and SQL Server 7.0.

—Louis Nguyen, DBA, Centex Homes

Add a Comment

OptimalSolutions, thank you for pointing out the problems with the listing links--although I'm sorry that the problem occurred. I totally understand your frustration. Readers have been reporting multiple similar problems over the past couple months. If it only were a matter of QA checking on our side, the problem wouldn't exist. However, the problem stems from several technical issues out of the editors' control, related to our Web hosting and production processes. This isn't to make excuses, rather to say that we editors are as frustrated as you are about it. I will report it ASAP to our Web team and will do what I can to keep the issue from happening again! Thanks for your patience.
Anne Grubb, Web site strategic editor, SQL Server Magazine

Anne2/21/2008 3:31:41 PM


As usual, none of the Listing links work.

"The page cannot be found"

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable.

Doesnt anyone desk check their work before posting things anymore? I mean, come on. This is not rocket science here.

ANTHONY2/21/2008 12:31:29 PM


Louis,

I love the elegance of this solution -- it certainly provides some good ideas on how to handle other problems. However, sometimes, elegant isn't the fastest way.

This function below accomplishes the same thing using standard string manipulation, and is 2900% faster on a 100-byte string. Of course, this function rapidly degrades in performance as the strings get larger: 240% faster on a 1000-byte string, only 40% faster on a 2500-byte string, and 1% slower on a 4000-byte string, at least on the server I was on. However, your usage (and that of most users) would be on sub-100-byte strings.

CREATE FUNCTION dbo.udf_CleanAlphaNum (@InputString varchar(8000))
RETURNS varchar(8000)
AS
BEGIN

declare
@return varchar(8000) ,
@length int ,
@counter int ,
@cur_char char(1)

SET @return = '
SET @length = 0
SET @counter = 1

SET @length = LEN(@InputString)

IF @length > 0
BEGIN

WHILE @counter <= @length
BEGIN
SET @cur_char = SUBSTRING(@InputString, @counter, 1)
IF @cur_char LIKE '[0-9a-zA-Z]'
BEGIN
SET @return = @return + @cur_char
END
SET @counter = @counter + 1
END

END

RETURN @return

END

Benjamin12/4/2007 10:29:11 AM


You must log on before posting a comment.

Are you a new visitor? Register Here
Enter the SQL Scramble - Daily Prize Giveaway

SQL General Queries

hi ,     I am facing a problem while trying to find some unwanted junk charactors from a column in a table from a Sybase IQ database.There is table ca...222-96145

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS