jueves, 2 de febrero de 2012

R meets HANA


If you read my last blog called HANA meets R you will remember that we read data from HANA into R directly, without having to download an .csv file, but using ODBC. This time, we're going to read data from HANA as well, but after do some nice tricks on R, we're going to post back the information into HANA.

Keep in mind, that is not an standard SAP solution. This only relies on a custom R package that can work with ODBC enabled tables, and like any custom packages, there are many limitations...anyway...this should be fixed when SAP released the official R into HANA integration.

In my previous blog Prediction model with HANA and R we create a stored procedure in HANA to populate a table called TICKETS_BY_YEAR, then on R we calculate the prediction for the next year and generate a nice graphic showing both the real data and the prediction. So...of course I'm not going to repeat all that.

This is the R code that we need to use...


library("RODBC")
ch<-odbcConnect("HANA",uid="P075400",pwd="HrCOpPk4")
Flight_Tickets<-sqlFetch(ch,"P075400.TICKETS_BY_YEAR")
period=Flight_Tickets$PERIOD
tickets=Flight_Tickets$TICKETS
var_year=substr(period[1],1,4)
var_year=as.integer(var_year)
var_year=var_year+1
var_year=as.character(var_year)
new_period=gsub("^\\d{4}",var_year,period)
next_year=data.frame(year=new_period,stringsAsFactors=FALSE)
prt.lm=lm(tickets ~ period)
pred=predict(prt.lm,next_year,interval="none")
period=next_year
tickets=pred
PREDICTION_TICKETS<-data.frame(period,tickets)
sqlDrop(ch,"PREDICTION_TICKETS",errors=FALSE)
sqlSave(ch,PREDICTION_TICKETS,rownames="id")
odbcClose(ch)

After we execute this code, we can check on HANA that our new table called PREDICTION_TICKETS was created...


And the data was populated as expected...


You may wonder...which are the limitations? Everything seems to work like a charm? Easy...not a lot, but important limitations...

* We don't have a way to validate if the table exists or not.
* We must delete the table before doing the insert, otherwise is not going to work.
* Even when the date field was called PERIOD, R named it "year" and pass it into HANA.
* We can't specify the type of the fields, nor the lenght
* We are forced to have an additional column with a numeric index, that we can nicely call "Id"...

As I said early...this is just a custom package that allows us to play...this shouldn't be used as a final solution, but as a playground. Enjoy!

Greetings,

Blag.

HANA meets R


In my previous HANA and R blogs, I have been forced to create .csv files from HANA and read them on R...an easy but also boring procedure...specially if your R report is supposed to be run on a regular basis...having to create an .csv file every time you need to run your report it's not a nice thing...

After spending some time reading and researching R...I finally came to a library that can read data from any relational database and being HANA, ODBC capable, the work is just a piece of cake -;)

For this examples, we must install two libraries: RODBC and Plotrix and create the DSN connection as shown here...


Here we're going to "Add..." a new "User DSN"


HANA already provides us a driver, so we're cool


Assign a name for the "Data Source Name", "Description" is optional and "Server:Port" should be of course filled.

Now...we're ready to go to our HANA studio an create a table and a stored procedure...



CREATE PROCEDURE GetTicketsByYearMonth
(IN var_year NVARCHAR(4),IN var_month NVARCHAR(2))
LANGUAGE SQLSCRIPT AS
BEGIN
select count(bookid), carrid
from sflight.snvoice
where year(fldate) = VAR_YEAR
and month(fldate) = VAR_MONTH
group by carrid
into TICKETS_BY_YEAR_MONTH;
END;


CALL P075400.GetTicketsByYearMonth('2011','12');

After we run our Stored Procedure...we have all the information in the table...Ok...only two fields...today was a hard day...I'm tired -:P


Finally...we can code some R! First, we're going to create a Fan Plot (The Plotix library is needed for that one) and then a Bar Plot...I used the same code for both, so just replace the comment on one by the other one and run it again...I know...I'm being lazy again...but at least I'm not reinveting the wheel -;) Two codes with only 1 different line? No thanks...


library("plotrix")
library("RODBC")
ch<-odbcConnect("HANA",uid="P075400",pwd="***")
res<-sqlFetch(ch,"P075400.TICKETS_BY_YEAR_MONTH")
fan.plot(res$TICKETS,labels=res$CARRIER,
main="Tickets for December 2011")
#barplot(res$TICKETS,names.arg=res$CARRIER)
odbcClose(ch)

The code is very simple...we call the libraries we need, we stablish a communication to our DSN, we fetch the data from the table, we create the graphics and finally we close the connection.

And here come the graphics...



I will keep investigating on this way to connect HANA and R...more blogs should be on the way -;)

Greetings,

Blag.