brendan tierny predictive analytical queries (pq) in oracle 12c
Queries (PQ) in Oracle 12c
One of the new SQL features introduced into Oracle Database 12c is the ability to create on the fly predictive models for your data. All you need to do is to write a SQL query that will build a predictive model and will then apply this model to your data. This is all done in one step and does not require you have knowledge of its inner works. The predictive models that are built during the execution of the Predictive Query only exists while the query is being run. When the query is finished all the models and associated settings are deleted. These are called transient models.
One of the confusing aspects of the new feature is that is goes by many names. In the documentation it can be found under On-the-Fly Models, and sometimes it is called Dynamic Queries. This feature is also part of the Oracle Data Miner tool that is part of SQL and here it goes by the name of Predictive Queries.
On-the-Fly Models = Dynamic Queries = Predictive Queries. Yes that is a bit confusing.
From talking to some of the Oracle folks is seems that the name Predictive Queries might be the name they will be using going forward.
What are predictive queries and why would you need them
Predictive Queries enable you to build and score data quickly using the in-database data mining algorithms, without the complexity of needing to understand the required settings and fine-tuning of the models. All models created during the execution of the Predictive Query will not exist once the Predictive Query is finished executing. You cannot inspect or tune the algorithms being used or the models that are generated. So you have to trust what Oracle is doing behind the scenes. If you are a Data Scientist type of person you will typically want to tune the models, so this approach is maybe not for you. But if you want to very quickly build models and score your data then Predictive Queries is something you should consider.
A major advantage of using Predictive Queries is that you can partition the data so that predictive models can be build specific to each partition. What this will do, is that it will divide your data into the relevant partitions and will then create a predictive model specific to that partition and then score the data in the partition. Typically for most data mining tools you will have to specifically define the creation of the data subset for the partition, define how to build the model and then run the model to score the data. All of these steps have to be manually defined. By defining a partition in the Predictive Query all of these work will be done automatically for you. There are two major advantages to this. The first is that as new data partitions (i.e. a new value exists for the partition attribute) are created the Predictive Query will automatically pick this up and do all the work for you. The second is that Predictive Queries allows you to use the Parallel Query option to speed up the process of scoring the data. This will be particularly useful when you are working with Big Data.
Using Predictive Queries is very similar to using the PREDICT procedure in the DMBS_PREDICTIVE_ANALYTICS PL/SQL package, except that instead of using PL/SQL, Predictive Queries uses SQL.
There are two ways you can go about creating your predictive queries. The first option is to write a SQL statement to prompt the database to use the in-database data mining algorithms. The second (and much easier) option is to use the Predictive Queries nodes in the Oracle Data Miner tool.
The following sections of this article will illustrate how to write Predictive Queries using analytical functions in SQL.
Oracle Analytic Functions
Analytic Functions first appeared in the Oracle database starting back in Oracle 8i Database (or even earlier). With each release of the database we have more functions being added. We now have over 46 functions in 12.1c. The following table lists the analytic functions, with the Predictive Query related functions highlighted in red
Table1 : Analytic Functions available in Oracle 12.1c Database
An analytic function has three main parts to the syntax. Firstly there is the definition of the function. Within this syntax we have the other two parts, the Partition and Window clauses, as illustrated in the Figure 1.
Figure 1 : Analytic Functions Syntax
I will give examples of using the Predictive Analytic Functions with and without the Partition and Windowing clauses.
Oracle Predictive Query Analytic Functions
As shown in Table 1 above we have ten Predictive Analytic Functions. The first set of these functions begins with CLUSTER and can be used to perform cluster analysis and to see the details of each cluster.
The second set of these functions (on the right hand side of Table 1) can be used for Classification, Regression, Anomaly Detection and Feature Sets. These sets of functions a very powerful, so you need to be very careful of how you write the predictive query as you may end up getting a type of prediction you didn’t expect.
The magic of the Partitioning Clause
When we use Predictive Queries we can build ask Oracle to build and apply predictive models on our data. These predictions can then be used to gain additional insight of our data and to make decisions.
When we use the Partitioning Clause we can tell the Oracle database to treat the data for each Partition differently. That means for Predictive Queries the database will build and apply a separate data mining model for each set of values that the Partition defines.
For example in Figure 2 we can see that approximately 90% of the data comes from one country. In this kind of scenario it does not make sense to use the same data mining model for all countries. What we should do is to create a separate model for each country.
Figure 2: Distribution of transactional records by country.
When we use the Partitioning clause we can specific how we want the Predictive Queries to create these separate data mining model. Using the data in Figure 2 we can add the COUNTRY_NAME to the Partition clause.
Similarly if we wanted to refine our data mining to a more granular level we could add additional attributes to the Partition clause. This will be illustrated in the following examples.
How to get started with Predictive Queries
When starting out with Predictive Queries it can be a little bit difficult to get the syntax of your queries correct. To help you to overcome these and to get up to speed quickly with Predictive Queries Oracle has provided 2 simple ways to get going.
The first of these is to use the Snippets section of SQL Developer. Snippets are code fragments and examples that gives you an example of a prebuilt query that uses a particular function. Figure 3 shows you how to access the snippets in Oracle SQL Developer.
Figure 3: Snippets in Oracle SQL Developer
The second way is to use the Oracle Data Miner workflow tool that is part of Oracle SQL Developer. The Predictive Queries section of the Components window only becomes visible when you are connected to an Oracle 12c Database, as shown in Figure 4.
Figure 4: Predictive Queries in Oracle Data Miner
Classification using Predictive Queries
In our first example we are going to look at how you can perform a simple Classification Predictive Query in SQL.
If you have already built an Oracle Data Mining Classification model you are able to use the PREDICTION and PREDICTION_PROBABILITY functions to apply that model to your data. Similarly for Predictive Queries we can use these functions but they will have some additional specification to enable Predictive Queries to work.
In the following example it shows a Predictive Query that will use the data in the MINING_DATA_BUIL_V view to build a number of data mining models.
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER ()
In this query we are asking Oracle to take all the data in the MINING_DATA_BUILD view (USING * in the query) and to work out what data mining algorithm to use, what setting to use and to optimize itself, using this data. Figure 5 shows the output from this query.
Figure 5: Classification Predictve Query results based on all the data.
In the results shown in Figure 5 above we get to see and compare the actual value with the predicted value. You may notices that it got one of the predictions incorrect. This is typical in data mining. It will not get everything 100% correct. But that is OK. Part of the reason for this is that the data mining algorithm is trying to work out what works best for all the data across all the records. One way to improve the accuracy is to look to build data mining models based on different groupings of the data.
Using the PARTITION clause
When you want a separate data mining model created for separate sections on the data then you can use the PARTITION BY clause to achieve this. For example we have a COUNTRY_NAME attribute. Maybe it would be better to build separate data mining models for each country.
To do this we need to example what is entered in the PARTITION BY clause. Here we can add the COUNTRY_NAME to the PARTITION BY clause and when the query is run a separate data mining model will be created for each country (Partition). These models will then be used to score or label the data with the predicted value. The following query extends our previous example by using the PARTITION BY clause and the results are shown in Figure 6.
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER
(PARTITION BY "COUNTRY_NAME") pred_affinity_card
Figure 6: Classification Predictve Query using the PARTITION BY clause results.
From the sample output shown above we can see what the actual value is for the predicted attribute (AFFINITY_CARD) and what the predicted value is. In the sample shown above we see that both values are the same and if we looked at all the records we would see a very high degree of accuracy. This is good and it shows us that the Predictive Query generated a very efficient in-database data mining model.
But the problem with Predictive Queries is that these very efficient data mining models no longer exist in the database. It only exists for the duration of the query execution and only exists as part of the execution of the analytic function. No Oracle data mining model is stored, even for a short time period, in the database.
HINT: For your typical classification type of problem your target attribute will need to be a character data type. If is isn’t for example it could be defined as a number, then you will need to apply the TO_CHAR function to the target attribute to convert it into a categorical value. If you don’t TO_CHAR the number attribute then the Predictive Queries will treat the data as a regression data mining problem.
Regression using Predictive Queries
Let us now look at a regression example.
For this example we are going to use another of the sample data sets that comes when you install the Oracle Data Miner repository. One of these tables is called INSUR_CUST_LTV_SAMPLE. What we want to do with this data is to predict what the potential Life Time Value (LTV) is for each each customer. We can then compare the predicted value with the calculated value and then determine if there are any customers who are generating as much revenue when compared with all the other customers.
The following examples shows a Predictive Query that calculates a predicted LTV and displays this with the current values. For records where you don’t have the LTV value the predicted value will give you an idea of what the LTV would be.
PREDICTION( FOR ltv USING *) OVER ( ) pred_ltv
Figure 7: Regression Prediction Results using Predictive Queries
Using the PARTITION clause
Just like we did for the previous example of using Predictive Queries we can ask the database to create multiple data mining models that are based on the values of an attribute or a set of attributes. In the following example our LTV Predictive Query is expanded so that it will create separate data mining models for each combination of STATE and SEX. In this case it will be creating 44 different data mining models, as we have 22 distinct states in the data set and 2 distinct values for SEX.
PREDICTION( FOR ltv USING *)
OVER ( PARTITION BY STATE, SEX ) pred_ltv
Figure 8: Regression Predictive Query Results using PARTITION BY
What Predictive Queries do not allow you to do
The use of Predictive Queries has limitations. When you run a Predictive Query Oracle will create a transient or temporary model. This model is only available during the execution of the query. The model produced by the Predictive Query does not exist afterwards and is not available for inspection or tuning of the model. If you need to inspect the model to correlate scoring results with the model, specify special algorithm settings, or execute multiple scoring queries that use the same model, you must create a predefined model using the Oracle Data Miner Model Nodes or by using the Oracle Data Mining PL/SQL package called DBMS_DATA_MINING.
Another alternative is to use the DBMS_PREDICTIVE_ANALYTICS PL/SQL package that is available in the Oracle 11g Database. This allows you to create some transient models but with Predictive Queries you get a much better and more efficient solution.
Oracle has been working on making the advanced analytics features that exist in the database more available to the data analyst. With Predictive Queries we have another example of this. Predictive Queries allows us to build advanced analytics into our queries without having to understand or know the underlying technology. But we can use our basic understanding to gain additional insight on our data in a very easy and efficient manner. If we want to build more complex solutions then we can use the Oracle Data Miner tool or use the underlying SQL and PL/SQL functional.
REGR_ (Linear Regression)
OTECH MAGAZINE #7 spring 2015 copyright otech magazine 2015