• subscribe
January 15, 2009 12:00 AM

How Simple Can Amazon SimpleDB Be?

SQL Server Pro
InstantDoc ID #101255

When searching the SQL Server news sites looking for an interesting topic to write about this week, I noticed several references to Amazon's entry into the database market with a product called Amazon SimpleDB. I was interested in finding out more about exactly how simple the product was, what its capabilities were, and how it differed from my favorite database product—SQL Server.

The first reference I went to was the eWeek article "Amazon Opens Public Beta for SimpleDB Service," which said "'Traditional clustered relational databases often require a sizeable upfront investment and complex design,' said Matt Domo, general manager of Amazon SimpleDB, in a statement. 'Our customers have asked us for a simple, scalable, reliable and low latency alternative. Amazon SimpleDB eliminates complexity so that developers and businesses can focus on optimizing applications and not administering their database. We’re pleased that we can now make the service available broadly and also pass along the cost savings we’ve achieved.'"

Ok, so Amazon SimpleDB is geared toward developers and businesses, not database administrators, architects, or tuners. Developers just want to develop. Unless they're specifically database developers, they rarely have the time or inclination to learn the gory details of database design, maintenance, and tuning. And Amazon SimpleDB requires no schema, according to the eWeek article. But what exactly does "no schema" mean? Nowhere could I find a more elaborate explanation of that claim, but in my continued research, I surmised that it meant that you didn't have to define your database structure, or the structure of any tables.

In addition, "Amazon Opens Public Beta for SimpleDB Service," mentions a "traditional clustered relational database" but provides no definition of that term. The tradition must be more than 22 years old because that term meant nothing to me.  Wikipedia didn't have any information about the terms "clustered database" or "clustered relational database," so I guess I’m not the only one who isn’t familiar with them.  However, I finally found a reference to a clustered database at
http://www.versant.com/developer/resources/objectdatabase/whitepapers/vsnt_whitepaper_scalability_clustering.pdf. I'll let you read the article for yourself, but the definition it provides for a clustered database sounds suspiciously like SQL Server’s clustered indexes. Is a clustered database something more? I couldn't find the answer in a quick search.

The Amazon SimpleDB page on Amazon.com is full of hype.  It claims that Amazon SimpleDB "eliminates the administrative burden of data modeling, index maintenance, and performance tuning."  This sounds almost too good and reminds me of my commentary "Will Database Tuning Become Obsolete?" which was about solid state disk (SSD) drives eliminating the need for tuning. Haven't we all been warned that if something sounds too good to be true, it probably is?

The following sentence in Amazon's description of the product immediately caught my eye: "However unlike a spreadsheet, Amazon SimpleDB allows for multiple values to be associated with each 'cell' (e.g., for item '123,' the attribute 'color' can have both value 'blue' and value 'red')." So the only fact I had at this time was that the database violated first normal form. I did find the "Detailed Description" page provided by Amazon.com, but it really wasn't detailed. It said nothing about how the product really worked.

By the time I had expended all my allotted hours for research, I hadn't made much progress at all. Yes, there were links to Amazon SimpleDB tutorials, Amazon SimpleDB performance, and applications built on top of Amazon SimpleDB. I found blog posts commenting on Amazon SimpleDB, including Marcelo Calbucci's blog post that claims Amazon SimpleDB isn't really a database at all but a directory service! 

I'm sure if I kept looking, I could eventually find the answers I'm seeking. There might be a reader out there who already has experience with Amazon SimpleDB and might be able to provide additional sources of information. But I wish that the provider's site (Amazon.com) contained a simple explanation of what makes Amazon SimpleDB different from other products out there. Saying that a product does everything for you isn't good enough. Instead of hype, I would find it incredibly useful to have some technical information that can help someone experienced with other products figure out exactly how this one might be an improvement. That should be simple enough, right?



ARTICLE TOOLS

Comments
  • David Parks
    2 years ago
    Apr 12, 2010

    I am a developer using Amazon Simple DB currently in a project, so I'll add a few pro's and con's to help folks out:

    Pro's:
    ===========
    - It's dirt cheap for low budget apps
    - It's easily available from anywhere on the internet, major plus for some distributed applications
    - It's self managed, no administrative overhead, maintenance, etc.

    Con's
    ===========
    - It's an utter bear to use for the developer, lots of limitations requiring multiple queries to ensure an entire set is returned, or to ensure various exception conditions are covered.
    - The API is dirt simple, meaning it doesn't support even the most common basic functions that we're used to with databases, you'll find yourself creating more code to work around the simplified database than you would vs. a standard relational DB. There's a lot of "Query-Read-Post" operations necessary where SQL would just require a Post where some basic clauses or simple arithmetic operations might suffice.
    - I would hate to think of how to implement concurrency with this thing, while entirely possible, it's not going to be a walk in the park.
    - It only supports strings, so be prepared for lots of conversions, and the potential runtime bugs that come with those conversions.
    - It doesn't natively support keys on multiple attributes, you'll have to concatonate those key attributes into another single field that constitutes the key (a duplication of data in most use cases, and another potential outlet for bugs)


    With all that said, I'm using it. I'm not liking every moment of it, but I can't beat the price, distributed availability, and self management aspects of it anywhere, so I've accepted a little extra code, and the bugs that come with it for my project.

    You'll have to decide yourself. In my view, it's great for the little guys or straight forward use cases, but don't expect to take it too far beyond that. It sure as heck won't be an end to relational databases in my lifetime.


  • Panagiotis
    3 years ago
    Jan 16, 2009

    mfield, DaveK, you can cut Kalen some slack. First, you can't install SimpleDB, it is an online offering. Second, the difference between the relational model and the cloud(ish) model is as big as the difference between sequential files and the relational model. It requires a fundamental shift in the way you think about data - plus attending 10 hours worth of sessions by one of the pioneers on the subject.

    Kalen, you are too kind when you describe my attempt to cram a week's worth of Teched in a couple of sentences. I agree with you that SimpleDB (or the Azure Table Service for that matter) may appear simple but building applications that use it correctly will be anything but:
    - Versioning the data is easy, we just create a new entity with the attributes we need. Versioning the code is much harder, as it now has to cope with all data versions ever created.
    - The cloud DBs take care of replicating our code for us and that makes availability really simple. But, updates take time to propagate to all replicas and we may well receive stale data in a subsequent read. Imagine a client in Australia requesting data that has just changed. The cloud DB has to decide which replica to return, the one in the US or the one in Sweden? There is no guarantee that the replicas are up to date.The application will have to cope with stale data too.
    - Limited or no transaction support means we don't need to concern ourselves with locking and deadlocks anymore. Instead, we need to handle partially failed updates and compensating transactions.

    Getting all this right will not be easy and we are all just now discovering what it takes to build systems for the cloud. There is no unifying theory or model for the cloud yet, just a general concensous among pioneers on the field that we need to relax some of the guarantees we took for granted if we want to scale out. A frightening but interesting prospect.

  • Rich
    3 years ago
    Jan 16, 2009

    I too was surprised at the lack of research that was done here and the missing level of quality expectation I admire from SQL Server magazine contributors.

    This isn't about some new type of service, its a database principle which I'd expect is known amongst DB professionals.

    SimpleDB, MS SQL Service Data Services, Google BigTable all sacrifice traditional DBMS at the advantage of massive scalability. They are for the most part columnar databases, meaning you get one giant table with keys and each row has many columns. Each row doesn't necessarily have to have the same columns or data types. So there is no fixed schema, no JOINs (although references are sometimes possible but not efficient like a JOIN), etc.

    These services and traditional databases are apples and oranges, each having their purposes and particular uses. You need a thick business layer to process your data, where much of this would be done in the database with a DBMS.

    I'd suggest a proper review so DB professionals can truly evaluate these and make the right decisions around using these services. Unlike the article post states, there is ALOT of information available on how these services work and when they are appropriate architecture wise. Don't follow definitions the service provider gives - its all marketing speak - go out and see what people using it say.

  • Mark
    3 years ago
    Jan 16, 2009

    You are forgetting that your audience will treat it as a review. Your negativity about the product are all they will remember. That's human nature. Certainly you could have found more positives to describe to make it a more rounded. I certainly understand the limitation of time. Perhaps you should stick to the SQL Server topics that we all respect you for, and you can speak from the hip about if you don't have time to do the deep dive into a different product?

  • KALEN
    3 years ago
    Jan 16, 2009

    Wow..thanks for all the comments!

    I don't have hours and hours of spare time, and can only spend a few hours in researching and writing my commentary. If this was a full SQL Server Magazine article describing SimpleDB, then of course I would have installed it. But this is a commentary, not a research article.

    My point in the commentary was that some of the questions I was asking should have been answerable WITHOUT having to install the product. Yes, if I was seriously considering using SimpleDB, I would intall it. But there should be more than hype available up front to let me know whether I should even spend any of my limited spare time on hands-on research.

    So, mfield, you obviously missed my point. I was not rating the product. I was rating the announcement of the product and the availability of information available to help people decide whether to try it out or not.

    Thanks to Brent, for your wonderful followup! And pkanavos, I appreciate your additional data points!

    ~Kalen

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...