lunes, 18 de febrero de 2013

When SAP HANA met R - What's new?


Since I wrote my blog When SAP HANA met R - First kiss I had received a lot of nice feedback...and one those feedbacks was..."What's new?"...

Well...as you might now SAP HANA works with R by using Rserve, a package that allows communication to an R Server, so really...there can't be too many new features...but...the cool thing is that SAP HANA has been tested with R 2.15 and Rserve 0.6-8 so any new features added on R and Rserve and instantly available on SAP HANA -;)

But hey! I wouldn't write a blog is there wasn't at least one new cool feature, right? You can read more about it here SAP HANA R Integration Guide.

Of course...for this you need SAP HANA rev. 48 (SPS5)

So what's the cool feature? Well...you can store a train model like lm() or ksvm() directly on a table for later use. This is really cool, because if you have a big calculation to be made, you only need to store the model and use it later without having to reprocess everything again.

Let's make an example...and hope all the R fans doesn't kill me for this...because when it comes to statistics...I'm really lost in the woods -:(

Let's say we have two tables from the SFLIGHT package...SPFLI and STICKET, so we want to predict how many times a customer is going to flight to different destinations (CITYFROM-CITYTO) depending on how many times all the customers has flights to those very same locations.

We're going to create one SQLScript file to get the information, transform it, create the model and store it in the database...

Build_Flight_Model.sql
--Create a TYPE T_FLIGHTS to grab the information from the SPFLI and STICKET tables.
 
DROP TYPE T_FLIGHTS;
CREATE TYPE T_FLIGHTS AS TABLE (
CARRID NVARCHAR(3),
CUSTOMID NVARCHAR(8),
CITYFROM NVARCHAR(20),
CITYTO NVARCHAR(20)
);
 
--Create a TYPE FLIGHT_MODEL_T and a table FLIGHT_MODEL to get and store the model in the database.
 
DROP TYPE FLIGHT_MODEL_T;
CREATE TYPE FLIGHT_MODEL_T AS TABLE (
ID INTEGER,
DESCRIPTION VARCHAR(255),
MODEL BLOB
);
 
DROP TABLE FLIGHT_MODEL;
CREATE COLUMN TABLE FLIGHT_MODEL (
ID INTEGER,
DESCRIPTION VARCHAR(255),
MODEL BLOB
);
 
--This R procedure will receive the T_FLIGHTS information, create table containing a field call FLIGHT_NAME 
--that will contain the concatenation
--of the CARRID, CITYFROM and CITYTO. ie: AA-NEW YORK-SAN FRANCISCO.
--We're going to convert the table into a data.frame, so all the similar values in FLIGHT_NAME are going to 
--be summarized.
--Using the subset() function, we're going to get rid of all the FLIGHT_NAME's that has a frequency lower 
--or equal than 0.
--We're going to use the nrow() function to count all the FLIGHT_NAME occurrences and multiply that by 
--10 (Stored in f_rows)
--We're going to use the sum() function to sum all the frequencies and the divide it by f_rows 
--(Stored in f_sum)
--We're going to use the mapply() function to divide each of the frequencies by f_sum
--We're going to use the order() function to sort by FLIGHT_NAME
--We're going to use the colnames() function to assign names to our data.frame
--We're going to use the lm() function to generate a Linear Regression based on the FLIGHT_NAME 
--and it's frequency
--Finally, we're going to use the generateRobjColumn() custom created function to store the result of the 
--model in the buffer.

DROP PROCEDURE FLIGHT_TRAIN_PROC;
CREATE PROCEDURE FLIGHT_TRAIN_PROC (IN traininput "T_FLIGHTS", OUT modelresult FLIGHT_MODEL_T)
LANGUAGE RLANG AS
BEGIN
generateRobjColumn <- function(...){
          result <- as.data.frame(cbind(
                    lapply(
                              list(...),
                              function(x) if (is.null(x)) NULL else serialize(x, NULL)
                    )
          ))
          names(result) <- NULL
          names(result[[1]]) <- NULL
          result
}
tab<-table(FLIGHT_NAME=paste(traininput$CARRID,traininput$CITYFROM,traininput$CITYTO,sep="-"))
df<-data.frame(tab)
ss<-subset(df,(df$Freq>0))
freq<-ss$Freq
f_rows<-(nrow(ss)) * 10
fsum<-sum(freq) / f_rows
ss$Freq<-mapply("/",ss$Freq, fsum)
flights<-ss[order(ss$FLIGHT_NAME),]
colnames(flights)<-c("FLIGHT_NAME","FREQUENCY")
lmModel<-lm(FREQUENCY ~ FLIGHT_NAME,data=flights)
modelresult<-data.frame(
ID=c(1),
DESCRIPTION=c("Flight Model"),
MODEL=generateRobjColumn(lmModel)
)
END;
 
--This SQLSCRIPT procedure will grab all the needed information from the tables SPFLI and STICKET 
--and will assign it to flights
--We're going to call the R procedure FLIGHT_TRAIN_PROC
--We're going to do an INSERT to finally store the model from the buffer into the database
 
DROP PROCEDURE POPULATE_FLIGHTS;
CREATE PROCEDURE POPULATE_FLIGHTS ()
LANGUAGE SQLSCRIPT AS
BEGIN
flights = SELECT SPFLI.CARRID, CUSTOMID, CITYFROM, CITYTO
             FROM SFLIGHT.SPFLI INNER JOIN SFLIGHT.STICKET
             ON SPFLI.CARRID = STICKET.CARRID
             AND SPFLI.CONNID = STICKET.CONNID;
CALL FLIGHT_TRAIN_PROC(:flights, FLIGHT_MODEL_T);
INSERT INTO "FLIGHT_MODEL" SELECT * FROM :FLIGHT_MODEL_T;
END;
 
CALL POPULATE_FLIGHTS();

When we call POPULATE_FLIGHTS(), our FLIGHT_MODEL table should look like this...



If you are wondering why we have an "X"...it's because the content is serialized and stored in a BLOB field...if you inspect the content, you will receive a bunch of weird hexadecimal numbers...

Anyway...it took 6.165 seconds to SAP HANA to process 1,842,160 records.

Now the we have our model safely stored in the database, we can move to our next SQLScript file...

Get_and_Use_Flight_Model.sql
--We're going to create a TYPE T_PREDICTED_FLIGHTS and a table PREDICTED_FLIGHTS to store the information
--of the current number of flights and
--the estimated (according to our prediction) number of flights
 
DROP TYPE T_PREDICTED_FLIGHTS;
CREATE TYPE T_PREDICTED_FLIGHTS AS TABLE (
CUSTOMID NVARCHAR(8),
FLIGHT_NAME NVARCHAR(60),
FREQUENCY INTEGER,
PREDICTED INTEGER
);
 
DROP TABLE PREDICTED_FLIGHTS;
CREATE TABLE PREDICTED_FLIGHTS (
CUSTOMID NVARCHAR(8),
FLIGHT_NAME NVARCHAR(60),
FREQUENCY INTEGER,
PREDICTED INTEGER
);
 
--In this R procedure, we're going to receive the flight for a given customer, the model stored in the database 
--and we're going to return the result so it can be
--stored in our PREDICTED_FLIGHTS table.
--We're going to use the unserialize() function to extract the model.
--We're going to create a table containing a field call FLIGHT_NAME that will contain the concatenation of the 
--CARRID, CITYFROM and CITYTO.
--ie: AA-NEW YORK-SAN FRANCISCO. and also the CUSTOMID
--We're going to convert the table into a data.frame, so all the similar values in FLIGHT_NAME are going to be 
--summarized.
--We're going to use the colnames() function to assign names to our data.frame
--We're going to use the nrow() function to get the number of records in the data.frame (Stored in dfrows)
--We're going to use the rep() function to repeat the CUSTOMID value of the first record dfrows times
--We're going to use the predict() function to predict the amount of flights based on our model (retrieved 
--from the database) and the new data that we recovered
--Finally, we're going to create a data.frame containing all the information that should be stored in our 
--table PREDICTED_FLIGHTS

DROP PROCEDURE USE_FLIGHT;
CREATE PROCEDURE USE_FLIGHT(IN flights T_FLIGHTS, IN modeltbl FLIGHT_MODEL_T, OUT out_flights T_PREDICTED_FLIGHTS)
LANGUAGE RLANG AS
BEGIN
lmModel<-unserialize(modeltbl$MODEL[[1]])
tab<-table(FLIGHT_NAME=paste(flights$CARRID,flights$CITYFROM,flights$CITYTO,sep="-"),CUSTOMID=flights$CUSTOMID)
df<-data.frame(tab)
colnames(df)<-c("FLIGHT_NAME","CUSTOMID","FREQUENCY")
dfrows<-nrow(df)
customid<-rep(df$CUSTOMID[1],dfrows)
prediction=predict(lmModel,df,interval="none")
out_flights<-data.frame(CUSTOMID=customid,FLIGHT_NAME=df$FLIGHT_NAME,FREQUENCY=df$FREQUENCY,PREDICTED=prediction)
END;
 
--This SQLSCRIPT procedure will select the information from the FLIGHT_MODEL table and store in the flight_model 
--variable
--We're going to select all the needed information from the table SPFLI and STICKET based on the customer ID number
--We're going to call the R procedure USE_FLIGHT and it will return us the PREDICTED_FLIGHTS that we're going to 
--store in the database
 
DROP PROCEDURE GET_FLIGHTS;
CREATE PROCEDURE GET_FLIGHTS(IN customId NVARCHAR(8))
LANGUAGE SQLSCRIPT AS
BEGIN
flight_model = SELECT * FROM FLIGHT_MODEL;
out_flights = SELECT SPFLI.CARRID, CUSTOMID, CITYFROM, CITYTO FROM SFLIGHT.SPFLI INNER JOIN SFLIGHT.STICKET
                   ON SPFLI.CARRID = STICKET.CARRID AND SPFLI.CONNID = STICKET.CONNID
                   WHERE CUSTOMID = :customId;
CALL USE_FLIGHT(:out_flights, :flight_model, PREDICTED_FLIGHTS);
INSERT INTO "PREDICTED_FLIGHTS" SELECT * FROM :PREDICTED_FLIGHTS;
END;

Now that we have all our Stored Procedures ready...we can create the last SQLScript file to actually fill our PREDICTED_FLIGHTS with some data...

Predict_Flights_for_Customers.sql
CALL GET_FLIGHTS('00000156');
CALL GET_FLIGHTS('00002078');
CALL GET_FLIGHTS('00002463');

As you can see...we only need to call the GET_FLIGHTS procedure, passing the Customer ID's...

This process took only 970ms and generate 122 records for the 3 customers...

Now I'm sure you realize how cool is this...if we haven't stored our model in the database...then we would have to calculate the model for each customer...and it would have took us around 7 seconds for each (get the lm(), plus the predict)...that would have been around 21 seconds for the 3 customers...while we can say that the whole process took us only 7 seconds...if you needed to calculate the prediction for all the more than 1 million customers...you will be in sort of trouble -:)

Let's see the content of our PREDICTED_FLIGHTS table...of course, I'm going to only show a part of it...


We can dump this information to a CSV file from SAP HANA Studio and let's say...use it on SAP Visual Intelligence to generate a nice graphic...


Hope you like it -:)

Greetrings,

Blag.

miércoles, 13 de febrero de 2013

A Shiny example - SAP HANA, R and Shiny


As you may already know...I love R...a fancy, open source statistics programming language. So today, I decided to learn something new using R.

There aren't much Web Servers for R, but there's one that I really like called Rook, that I covered on my blog RSAP, Rook and ERP.

Today, I tried a new one that's is making a lot of noise in the R community, called Shiny. I gotta say...I felt instantly in love with it...

So you may ask...so what's so cool about Shiny? Well...besides the fact that it allows you to create web applications using R...it's completely dynamic...meaning that once you change a parameter, the graphic and not the whole web page is reloaded automatically...awesome, huh?

So...when I wrote about Ruby and SAP in my blog Ruby joins the SAP HANA party, I tried to emulate an SE16 to browse tables from the SFLIGHT package...this time...I will do the same but with a little twist...the application will allow you to choose a table, but also to choose how many records you want to display...

So, what we need? Simply...if you haven't already...install this two packages..."shiny" and "RODBC"...


After this, we need to create a folder called Shiny and after this create a new one called SAP_HANA_R (This is just to be organized).

We now need to create two files, called ui.R and server.R


ui.R
library("shiny")
library("RODBC")
 
ch<-odbcConnect("HANA_TK",uid="SYSTEM",pwd="manager")
odbcQuery(ch,"SELECT table_name from SYS.CS_TABLES_ where schema_name = 'SFLIGHT'")
tables<-sqlGetResults(ch)
odbcClose(ch)
 
shinyUI(pageWithSidebar(
 
  headerPanel("SAP HANA and R using Shiny"),
 
  sidebarPanel(
    selectInput("Table", "Choose a table:",
                choices = tables$TABLE_NAME),
    numericInput("Records", "Number of Records to view:", 10)
  ),
 
  mainPanel(
    tableOutput("view")
  )
))

server.R
library("shiny")
library("RODBC")
 
shinyServer(function(input, output) {
 
  output$view <- reactiveTable(function() {
    ch<-odbcConnect("HANA_TK",uid="SYSTEM",pwd="manager")
    schema_table<-paste("SFLIGHT.",input$Table,sep="")
    query<-paste("SELECT TOP",input$Records,"* FROM",schema_table)
    odbcQuery(ch,query)
    result<-sqlGetResults(ch)
    odbcClose(ch)
 
    head(result, n = input$Records)
  })
})

When we have finished with the two files...we can create a new one just to call our application.

Shiny_HANA.R
library(shiny)
setwd("C:/Blag/R_Scripts")
runApp("Shiny/SAP_HANA_R")

Keep in mind that the setwd("C:/Blag/R_Scripts") is my main R Script folder, as setwd stands for "Set Working Directory"...

When we run Shiny_HANA.R, the browser will popup showing the parameters and the table by default.


As you can see, we can choose a new table to display.


We can also choose how many lines or records we want to display...



I hope you like it... -:)

Greetings,

Blag.

sábado, 9 de febrero de 2013

PHP rocks on SAP HANA too! (Linux version)


If you know me...you know I'm not a Linux boy...however...I still have my laptop LG T1 Express Dual that I bought on 2007...for my first SAP TechEd ever...as you may assume...running Windows and installing more software on this laptop was a nice way to slowly kill it...so the laptop stayed on my closed for a long time...until I decided to install Ubuntu on it and bring to live once again -:)

What they say it's true...even the crappiest and oldest laptop will behave great on Linux...and BTW, I'm writing this blog on my Linux box -;)

A couple of weeks ago I wrote a blog called PHP rocks on SAP HANA too! as I realized that I haven't blog about PHP and SAP HANA and also because I knew that a lot of people were facing issues trying to make it work together...but...I wrote for Windows...and Linux users are still struggling to get this working...so...time for a new blog...

I gotta say...without this awesome blog HANA with odbc on Ubuntu 12.04 written by Ethan Zhang I will be probably still struggling with the connection details...but anyway...not being a Linux boy...I thought it would be a good idea to detail all the steps and problems that I overcome to finally make it work...

Of course...I didn't have PHP installed on my Linux box...so the first step was install it...however, I didn't want to spend so much time installing everything separately, so I decided to use LAMP.

LAMP Installation
$ sudo apt-get install tasksel
$ sudo tasksel install lamp-server


After this two simple lines...PHP was up and running...so next step was install the SAP HANA Client. 32bits Linux Version.

Now, I needed something to connect to my SAP HANA Server...so unixODBC was the best choice.

unixODBC Installation
$ sudo apt-get install unixODBC unixODBC-dev

With this, it was time to configure the ODBC connection...

odbc.ini configuration
$ sudo vi odbc.ini
 
[HDB]
driver = /usr/sap/hdbclient32/libodbcHDB32.so
ServerNode = hana_server:30115

It was time to the first test...so I did the following...

ODBC Testing
$ isql -v HDB SYSTEM manager

After this...I got an error saying that libodbcHDB32.so couldn't be found because the file or directory didn't exist...it was weird...I decided to take a look at the dependencies...

Checking Dependencies
$ ldd /usr/sap/hdbclient32/libodbcHDB32.so

This point me out to the fact that libaio.so wasn't found on my system...so using the Ubuntu Software System...I just installed it...


My next test of isql was successful, so...being kind of lazy...I just copied the code from my other blog and run it -;)



PHP_HANA.php
<?php
$conn = odbc_connect("HDB","SYSTEM","manager",SQL_CUR_USE_ODBC);
if (!($conn))
{
          echo "<p>Connection to DB via ODBC failed: ";
        echo odbc_errormsg ($conn);
        echo "</p>\n";
}
else{
          if(isset($_POST["CARRID"]) == false){
                  $sql = "SELECT CARRID, CARRNAME FROM SFLIGHT.SCARR WHERE MANDT = 300";
               $rs = odbc_exec($conn,$sql);
               print("<DIV ALIGN='CENTER'>");
               print("<H1>SAP HANA from PHP</H1>");
               print("<FORM NAME='Get_Data' ACTION='$_SERVER[PHP_SELF]' METHOD='POST'>");
               print("<SELECT NAME='CARRID'>");
               while($row = odbc_fetch_array($rs)){
                     $carrid = $row["CARRID"];
                     $carrname = $row["CARRNAME"];
                     print("<OPTION VALUE='$carrid'>$carrname");
               }
        print("</SELECT>");
        print("<INPUT TYPE='SUBMIT' VALUE='Get Data'>");
        print("</FORM>");
        print("</DIV>");
        }
        else{
                  $carrid_param = $_POST["CARRID"];
                $sql = "SELECT * FROM \"_SYS_BIC\".\"blag/AV_FLIGHTS\"
                        WHERE CARRID = '$carrid_param'";
                $rs = odbc_exec($conn,$sql);
                print("<DIV ALIGN='CENTER'><TABLE BORDER=1>");
                print("<TR><TH>MANDT</TH><TH>CARRID</TH><TH>CONNID</TH>
                       <TH>COUNTRYFR</TH><TH>CITYFROM</TH>
                       <TH>AIRPFROM</TH><TH>COUNTRYTO</TH>
                       <TH>CARRNAME</TH><TH>DISTANCE</TH></TR>");
                while($row = odbc_fetch_array($rs)){
                          $mandt = $row["MANDT"];
                $carrid = $row["CARRID"];
                $connid = $row["CONNID"];
                $countryfr = $row["COUNTRYFR"];
                $cityfrom = $row["CITYFROM"];
                $airpfrom = $row["AIRPFROM"];
                $countryto = $row["COUNTRYTO"];
                $carrname = $row["CARRNAME"];
                $distance = $row["DISTANCE"];
                print("<TR><TD>$mandt</TD><TD>$carrid</TD>
                         <TD>$connid</TD><TD>$countryfr</TD>
                         <TD>$cityfrom</TD><TD>$airpfrom</TD>
                         <TD>$countryto</TD><TD>$carrname</TD>
                         <TD>$distance</TD></TR>");
               }
               print("</TABLE>");
               print("<A HREF='PHP_HANA.php'>Go Back</A></DIV>");
          }
}
?>



As you can see...not too hard...even for a Windows boy -:D

Greetings,

Blag.