Database talk: why use one over the other?

a_unique_person · Mar 9, 2006

Informix is dead in the water. They gave up, and IBM bought them out to try to grab the customer base.

For a FREE and Enterprise capable database, try MAX-DB. It is certified by SAP as capable of running their applications, and it is really a very mature product that has been around for 20 years in various guises.

Access is fine for playing around.

CriticalThanking · Mar 9, 2006

My apologies for speaking American instead of English.

BI is the current fun part of databases. Relational Database Management Systems (RDBMS) are almost a commodity. But what gives you an insight into your business that simple SQL may not easily by able to give?

You had to ask...

[pedant]
BI = Business Intelligence
Fancy words for looking at trends and patters across data, rather than at the individual transactions.

OnLine Analytical Processing (OLAP) and Data Mining are flavors of BI. OLAP can be thought of as seeing aggregated numerical measures by one or more dimensions. For example, I want to see sales by region, profit by product line, or number of cries for "evidence" per JREF poster. Again - the individual transactions are only relevant as fodder for the aggregated analysis by these pre-defined dimensions. I can quickly sort through the aggregated data and find the stores with the best/worst measures. Which store has the best sales? Which has the worst profit? Which JREF thread has the least signal to noise ratio? I could do this with raw SQL, but the cost is WAY too high when you want to pivot the multiple dimensions. There is specialized software to process, slice and dice data in this manner. Microsoft has an entire language called MuliDimensional Expressions (MDX) for this purpose. MDX is like SQL on steriods.

My definition of Data Mining is looking for patterns across data when the dimensions are not predefined. Now you are asking the Data Mining software to classify, group, cluster, or perhaps explain causal relationships in the data. For example, what are the common characteristics of fraudulent credit card transactions? What are the characteristics of person likely to default on a loan. What products placed together (or apart) on a grocery store shelf will maximize profit?

I may use a particular mining algorithm to find my most profitable customers. Perhaps I use a Decision Tree to drill down to a reasonable list of customers to whom I want to market. I find that the best predictor of overall profitability of a customer is whether or not the customer buys a widget on Tuesdays. Of those people, I find the better customers are those whose income is over some amount. I keep drilling until I have people with the desired characteristics. I did not know which attributes/dimensions were important before I asked the algorithm to find the significant ones.

With a clustering algorithm, perhaps I can even predict into which cluster a new customer will fall. After I "train" my data model, I ask the algorithm to predict whether or not the customer is likely to be profitable, and with what degree of confidence.

Famous examples of Data Mining in everyday web life is Amazon.com's recommendations and suggestions. "Customers who bought this also liked..."
[/pedant]

I'd better stop here or this post will go on forever. I will gladly discuss the ins and outs of BI with anyone in a separate thread or offline.

And I really do have a life. Honest.

CriticalThanking

varwoche · Mar 9, 2006

CriticalThanking said:
BI = Business Intelligence
Fancy words for looking at trends and patters across data, rather than at the individual transactions.

Because the term is so inane for the obvious reasons I tend to tell people that I make my living "in the field that has come to be known as business intelligence".

Microsoft has an entire language called MuliDimensional Expressions (MDX) for this purpose. MDX is like SQL on steriods.

I have specialized in MDX since inception and it still amazes me.

Wudang · Mar 9, 2006

Arkan_Wolfshade said:
On Line Transactional Processing - a database defined for performance on Inserts/Updates/Deletes

Nope - an OLTP system can have any kind of data back end depending on what it's doing. I use the IBM dictionary a lot as it's more complete than most
http://www-306.ibm.com/software/globalization/terminology/op.html#x2142798

The Don · Mar 9, 2006

bigred said:
Good stuff here folks, thx

What about others like INFORMIX, DB2, SAP etc? Anyone?

DB2's fine except that standard tools don't seem to connect to it natively (you need DB2 connect or some such thing), the tools associated with it are IMHO even worse than those for Oracle and there isn't the same pool or resources available.

SAP is an application which can sit on a variety of database platforms including those you have listed. It has a proprietary data model which includes pool and cluster tables (virtual tables) which prevents open access to the data contained within them. Of its type SAP is fine but it's hideously expensive to implement and will either require extensive customisation or significant changes to your business processes.

Arkan_Wolfshade · Mar 9, 2006

Wudang said:
Nope - an OLTP system can have any kind of data back end depending on what it's doing. I use the IBM dictionary a lot as it's more complete than most
http://www-306.ibm.com/software/globalization/terminology/op.html#x2142798

Yes, an OLTP app can use any data backend, but the RDBMS for that OLTP needs to be tuned for insert/update/delete. It's going to be normalized table structure etc. OLAP performs best of of star/snowflake schemas since it is usually read heavy.

a_unique_person · Mar 9, 2006

The Don said:
DB2's fine except that standard tools don't seem to connect to it natively (you need DB2 connect or some such thing), the tools associated with it are IMHO even worse than those for Oracle and there isn't the same pool or resources available.

SAP is an application which can sit on a variety of database platforms including those you have listed. It has a proprietary data model which includes pool and cluster tables (virtual tables) which prevents open access to the data contained within them. Of its type SAP is fine but it's hideously expensive to implement and will either require extensive customisation or significant changes to your business processes.

SAP does have it's own Database as well, SAP-DB, which is now called MAX-DB, but was also once known as ADABAS as well as several other names.

The database is free to use, and is enterpise capable.

bigred · Mar 9, 2006

Some great replies here folks, thx. I've even got validation from a db guru that at least some of what you said is correct.

I'm like, all, learning and stuff :thumbsup:

Dark Jaguar · Mar 10, 2006

CriticalThanking said:
My apologies for speaking American instead of English.

BI is the current fun part of databases. Relational Database Management Systems (RDBMS) are almost a commodity. But what gives you an insight into your business that simple SQL may not easily by able to give?

You had to ask...

[pedant]
BI = Business Intelligence
Fancy words for looking at trends and patters across data, rather than at the individual transactions.

OnLine Analytical Processing (OLAP) and Data Mining are flavors of BI. OLAP can be thought of as seeing aggregated numerical measures by one or more dimensions. For example, I want to see sales by region, profit by product line, or number of cries for "evidence" per JREF poster. Again - the individual transactions are only relevant as fodder for the aggregated analysis by these pre-defined dimensions. I can quickly sort through the aggregated data and find the stores with the best/worst measures. Which store has the best sales? Which has the worst profit? Which JREF thread has the least signal to noise ratio? I could do this with raw SQL, but the cost is WAY too high when you want to pivot the multiple dimensions. There is specialized software to process, slice and dice data in this manner. Microsoft has an entire language called MuliDimensional Expressions (MDX) for this purpose. MDX is like SQL on steriods.

My definition of Data Mining is looking for patterns across data when the dimensions are not predefined. Now you are asking the Data Mining software to classify, group, cluster, or perhaps explain causal relationships in the data. For example, what are the common characteristics of fraudulent credit card transactions? What are the characteristics of person likely to default on a loan. What products placed together (or apart) on a grocery store shelf will maximize profit?

I may use a particular mining algorithm to find my most profitable customers. Perhaps I use a Decision Tree to drill down to a reasonable list of customers to whom I want to market. I find that the best predictor of overall profitability of a customer is whether or not the customer buys a widget on Tuesdays. Of those people, I find the better customers are those whose income is over some amount. I keep drilling until I have people with the desired characteristics. I did not know which attributes/dimensions were important before I asked the algorithm to find the significant ones.

With a clustering algorithm, perhaps I can even predict into which cluster a new customer will fall. After I "train" my data model, I ask the algorithm to predict whether or not the customer is likely to be profitable, and with what degree of confidence.

Famous examples of Data Mining in everyday web life is Amazon.com's recommendations and suggestions. "Customers who bought this also liked..."
[/pedant]

I'd better stop here or this post will go on forever. I will gladly discuss the ins and outs of BI with anyone in a separate thread or offline.

And I really do have a life. Honest.

CriticalThanking

But isn't data mining more or less illogical? I'm sure you could find all sorts of odd things in any given datapool, but how do you establish that there's actually some sort of cause and effect? Let's say you find out more people bought Frankenberry when it was seperate from the Count Chocula than when it was right next to it. How do you establish they actually did buy it because of that?

Wouldn't that merely be a source of data that makes you want to start an actual study on it than actually enough to reach a conclusion? And of course, to do proper testing, suddenly you are getting back down to the level of individual transactions studied in detail and in significant numbers in a controlled situation. It seems to me this data mining thing is as ineffective in marketting as it is in any other scientific discipline.

Further, if Amazon's suggestions are based on datamining, well that says it all doesn't it? I've almost never found myself interested in what other people also bought, as they tend to rarely be related to what I got. Further, Amazon has at a time suggested that one "Possible source of inspiration" for Mozart was the band Queen.

Wudang · Mar 10, 2006

Arkan_Wolfshade said:
Yes, an OLTP app can use any data backend, but the RDBMS for that OLTP needs to be tuned for insert/update/delete. It's going to be normalized table structure etc.

I may be quibbling but the data for an OLTP would be normalised only to the degree required for the transaction.

And DB/2 scales better than the rest as it has, I believe, a superior lock escalation and handling mechanism, especially for z/Series. I haven't played with DB/2 stinger yet but the press looks interesting. Like the Don says though a lot of the tools are just ......... lacking.

bigred · Mar 10, 2006

Dark Jaguar said:
But isn't data mining more or less illogical? I'm sure you could find all sorts of odd things in any given datapool, but how do you establish that there's actually some sort of cause and effect? Let's say you find out more people bought Frankenberry when it was seperate from the Count Chocula than when it was right next to it. How do you establish they actually did buy it because of that?

Wouldn't that merely be a source of data that makes you want to start an actual study on it than actually enough to reach a conclusion? And of course, to do proper testing, suddenly you are getting back down to the level of individual transactions studied in detail and in significant numbers in a controlled situation. It seems to me this data mining thing is as ineffective in marketting as it is in any other scientific discipline.

Further, if Amazon's suggestions are based on datamining, well that says it all doesn't it? I've almost never found myself interested in what other people also bought, as they tend to rarely be related to what I got. Further, Amazon has at a time suggested that one "Possible source of inspiration" for Mozart was the band Queen.

I think those are some good points, although the comparison is still worth noting (and kept in mind the next time I talk to a manager, esp. the cereal one - they can't grasp much more than that

).

kevin · Mar 10, 2006

a_unique_person said:
SAP does have it's own Database as well, SAP-DB, which is now called MAX-DB, but was also once known as ADABAS as well as several other names.

The database is free to use, and is enterpise capable.

MaxDB is now under the auspices of MySQL. You can get it under a GPL license or a MySQL license.

http://www.mysql.com/products/database/maxdb/

hodgy · Mar 12, 2006

Dark Jaguar said:
But isn't data mining more or less illogical? I'm sure you could find all sorts of odd things in any given datapool, but how do you establish that there's actually some sort of cause and effect? Let's say you find out more people bought Frankenberry when it was seperate from the Count Chocula than when it was right next to it. How do you establish they actually did buy it because of that?

Wouldn't that merely be a source of data that makes you want to start an actual study on it than actually enough to reach a conclusion? And of course, to do proper testing, suddenly you are getting back down to the level of individual transactions studied in detail and in significant numbers in a controlled situation. It seems to me this data mining thing is as ineffective in marketting as it is in any other scientific discipline.

It is not ineffective - the standard approach is to design a marketing campaign based on a target population idenitfied via data-analysis but such campaigns usually have a pilot phase where a small sample of the overall population is targeted first. Based on the results of this pilot, the campaign can be executed, tweaked and re-piloted or abandoned.

varwoche · Mar 12, 2006

Dark Jaguar said:
But isn't data mining more or less illogical? I'm sure you could find all sorts of odd things in any given datapool, but how do you establish that there's actually some sort of cause and effect?

Data mining identifies trends/tendencies. Knowing trends can be invaluable even if it doesn't prove cause and effect.

JamesM · Mar 12, 2006

Dark Jaguar said:
It seems to me this data mining thing is as ineffective in marketting as it is in any other scientific discipline.

I think you may be confusing the perjorative use of the term 'data mining', often used on these forums, with the way it's used by CriticalThanking, to mean the application of machine learning algorithms to datasets, which uses statistics to establish the significance of the results.

Datamining is not ineffective in scientific disciplines. At least, it's no more ineffective than more 'rational' approaches in many areas.

CriticalThanking · Mar 12, 2006

Dark Jaguar said:
But isn't data mining more or less illogical? I'm sure you could find all sorts of odd things in any given datapool, but how do you establish that there's actually some sort of cause and effect? [snip]

Excellent thought process in those questions! As any woo will tell you, predicting the future can be so darn tough.

Cause and effect is indeed a problem. Data mining models must be trained and then tested against real-world data.

There are famous examples of spurious connections found. There is the (perhaps apocryphal) story of the best predictor of a famous US stock market average being the price of yak butter in Nepal. Correlation is not necessarily causation.

One of my favorite failures in data mining is a neural net algorithm used to have a computer automatically recognize friendly tanks from enemy tanks under battlefield conditions. They trained it by feeding it pictures of various tanks. When they tested it in the field, it was sure every tank was an enemy tank. After much research, it was found that the training pictures of friendly tanks were all nicely posed pictures in good lighting, not moving, etc. All the enemy pictures were of poorer lighting/quality. Oops. Also note that the choice of mining algorithm strongly impacts the types of questions that can be answered. Neural nets are great at adapting to changing data, but terrible and explaning "why."

The fact that someone bought an extended car warranty is a pretty darn good predictor they bought the car, but useless for business purposes. Buying a car is not a useful predictor of whether or not the warranty will be purchased. There must be other factors at work. Can one conclude that association analysis (people who bought X also bought Y) is useless? No. For many applications, correlation is often good enough. I don't care WHY people who buy A also buy B, but I can run a test of store layout and see if I increase the overall "attach rate" by moving A and B closer together. And since warranties are so darn profitable, you can guarantee I am going to spend some time and money seeing if I can find a usefull predictor that will increase the warranty purchase rate. Is it income level? Is it age? Is it education level? If I can identify something that gives me even 1% greater purchases of warranties, I will make the company millions of dollars. And I will get the hearty thanks of my company. Those thanks and $5 will get me a cup of coffee at Starbucks.

For a supermarket example, if I know people will usually buy A and B, perhaps I can move the products far away from one another to increase the amount of time in the store and the number of other products you have to see on the way. Now I am testing not the attach rate of A and B, but the overall profitability of the trip - a.k.a. "Basket Analysis." Another way to phrase this is "what is the overall impact on a shopping trip (total or percent profit) if someone buys X?" Both basket analysis and attach rate analysis are a big deal in the retail industry.

Dark Jaguar said:
Wouldn't that merely be a source of data that makes you want to start an actual study on it than actually enough to reach a conclusion?

YES! You win the prize. Would you please join the management team of my company? You would be surprised how few people understand that.

Dark Jaguar said:
And of course, to do proper testing, suddenly you are getting back down to the level of individual transactions studied in detail and in significant numbers in a controlled situation.

As the data mining analyst, I still don't care about the individual transaction. I test individual transactions to the model to see if statistically I get an expected percentage of correct answers.

Dark Jaguar said:
It seems to me this data mining thing is as ineffective in marketting as it is in any other scientific discipline.

It depends upon what you want to predict and how good your input sample is.

I must also take exception to the word "ineffective" when used with "any other scientific displine," especially medicine. While you cannot guarantee that a particular case of disease X was cured by treatment Y, I have a pretty good chance of excluding other treatments as likely candidates by sampling the data. I don't care if a particular person had a miraculous spontaneous remission - if 80% of the patients treated with Y get better compared to 20% of untreated patients, western medicine is going to move to the treatment Y as the standard of care until either a better predictor is found or some better treatment comes along. (Sounds like a job for a decision tree or clustering algorition).

Dark Jaguar said:
Further, if Amazon's suggestions are based on datamining, well that says it all doesn't it? I've almost never found myself interested in what other people also bought, as they tend to rarely be related to what I got. Further, Amazon has at a time suggested that one "Possible source of inspiration" for Mozart was the band Queen.

My (only half-joking) comment is that JREF is a pretty odd subspecies of consumer. Amazon does not care if YOU buy Queen and Mozart - they have found that enough people like both to be profitable showing this to anyone that buys one of them. And yes, if Amazon is positing that Mozart channelled a band from several hundred years in the future, then yes, they need to add a temporal check on their mining model rules.

The science of data mining is still fairly young. There are only now decent tools to make it a worthwhile area for mainstream businesses to consider. And it will ALWAYS take poeple who knows how to set up the cases, pick a good algorithm, test the results with known data, and then DESIGN A REAL WORLD test with unknown data to make it worthwhile. With OLAP, it is often easy to tell a new customer that with some expert help, we can have key performance indicators (KPIs) of your business on the executive's dashboard in a matter of weeks. Return on invenstment (ROI) can often be measured fairly quickly and easily.

Data mining is a tougher nut to crack. You may spend a lot of time and confirm that the data you have does not predict anything. You can then go back and collect/purchase more data - customer surveys, demographics, competing market data, etc., and see if you can find better predictors. But management will not often say - "Thanks for spending 3 months and $1 million to come up with nothing. Here's more money." It takes management discipline to go after this stuff.
-----
I promise I really can make concise posts. You have just stumbled onto the only deep area of experience I have (that is usefull for financial gain). As penance I will do 10 reply posts consisting of nothing but "Evidence?".

CriticalThanking

Indolent Wretch · Mar 13, 2006

PostgreSQL isn't bad

I've used Informix, MS-SQL & Postgres

I have to say Postgres is very good at what it does, it is ACID compliant and it copes with large volumes well.

We have it running with our propriety server at a large customer and it's coping well with 100 million row tables.

Although it is only used for storage not for any complex requests, our software does the complex stuff with data loaded into memory.

MS-SQL I only used when it was young and I didn't take to it.

Informix I've used a lot, and it's a pile.

bigred · Mar 15, 2006

Indolent Wretch said:
it is ACID compliant

Tirdun · Mar 15, 2006

ACID: atomicity, consistency, isolation + durability

Atomicity and Durability are rollback options. If something fails in a transaction, hardware fails or the OS fails, the database backs up to a previous stable state through logs or recovey systems. Atomicity refers to database transactions, durability refers to everything else.

Consistency is rule-management, essentially. If something affects a bit of data, it has to correctly affect that data in all places. If that change is incompatable, it has to be refused and rolled back.

Isolation means that a user can't be allowed to interfere with another user's activity.

bigred · Mar 16, 2006

This may be getting off the db track a bit, but what are "business objects" then?

Database talk: why use one over the other?

Director of Hatcheries and Conditioning

Designated Hitter

Penultimate Amazing

BOFH

Penultimate Amazing

Philosopher

Director of Hatcheries and Conditioning

Penultimate Amazing

Graduate Poster

BOFH

Penultimate Amazing

Graduate Poster

Graduate Poster

Penultimate Amazing

Graduate Poster

Designated Hitter

Scholar

Penultimate Amazing

Muse

Penultimate Amazing