It is possible to connect Oracle OBIEE BI reporting tool set to a Google BigQuery dataset for analysis and dashboard reporting by using an ODBC driver provided by Oracle. it's a little more complex than your average data source, so settle down for a long read and enjoy!
Obtaining BigQuery Project details
Google BigQuery can simply be accessed at
https://bigquery.cloud.google.com/,
there are options to create a new account and load data quickly – I’ve setup a
free 365-day trial. Alternatively, if your organisation is already using
BigQuery your administrator should be able to help set you up an account and
obtain the details below.
For this blog I’ve created a simple dataset by uploading a
single CSV file, SALESDATA and creating a small set of additional tables to
format it into a dimensional model:
Before considering accessing via OBIEE, there are some items
to take note of here. Data is stored in Tables (or views). These are stored
with a Dataset, which in turn is stored in a Project.
The fields on a table are known as the Schema.
So, to translate into standard database terms:
Schema = Table fields
Table = Tables
DataSet = Schema
Project = Database
A Project has a name and an ID. The name is displayed on the
screen, but the ID is required for connections.
The easiest way to find the Project ID is the click on the
little drop-down icon after the Project name and select ‘Switch to Project’ –
this will display all projects, including the currently selected one:
So, the Project ID in this case is stately-equinox-206409
(this is auto-assigned by BigQuery, not something I made up!)
To configure an automated driver to connect to BigQuery we
need to setup a service account and its security.
To do this you need to access the settings screen. To do
this click on the Project drop-down icon and select ‘Switch to project’ then
‘Manage projects…’
In the manage resources screen you will need to check the Organisation
is set (my organisation is e-edba.com in this screen shot), this will then
display your projects. Click on the Project name to select it.
You can then see all the users who have access to this
project. In this case just my demo account:
We don’t want to use this account for automated connections
as it won’t be secure and of course my password will change occasionally.
Instead we create service account. Click on the ‘Service accounts’ link on the
left-hand side.
Click the ‘Create Service Account’ link at the top.
In the Create Service Account screen fill out the fields as
follows:
Service Account Name – enter a suitable name. I’ve entered
OBIEEConnection.
The service account name automatically gets set to the same
name (in lower case with spaces replaced with hyphens). You can change this if
you with. It also gets the email address domain prefixed as @projectid.iam.gserviceaccount.com, in my
case this is obieeconnection@stately-equinox-206409.iam.gserviceaccount.com.
You then need to pick two roles for the account. Click on
the role field to open up a drop-down box. Click on ‘BigQuery’ in the left-hand
list then on ‘BigQuery Data Viewer’ in the right-hand list.
Then click ‘add another role’ and select ‘BigQuery User’.
Next tick the ‘Furnish a new private key’ check box. The
private key file will be used in place of a password in the ODBC configuration
files. You can select either the
default JSON format or the older P12 format.
Ignore the ‘Enable G Suite Domain-wide Delegation’ unless
your organisation has wider google suite configuration. In which case speak to
your Google administration about this.
Finally click Save.
You should then get a download dialog window for the P12 key
file. Save the file in a suitable directory on the pc/server where you will run
the BI Administration tool from. It will need to be copied to the OBIEE server later
as well.
If you chose a JSON format private key file you will get the
following message:
For a P12 private key file you will get the following
screen. Note the password that has been set for the P12 private key file. This
always seems to default to ‘notasecret’.
The account then appears in the service accounts screen:
This account also appears in the IAM user screen from where
you can adjust the roles if required:
The connection from OBIEE will be via ODBC and for this we
need some ODBC drivers. These are provided free of charge by Google and can be
downloaded from the following URL:
Download the appropriate drivers for your OBIEE environment.
I needed the Linux 64 bit driver for the VM running my OBIEE service, however
the Windows driver is also required as the BI Administration tool needs to
connect to BigQuery to import the table objects and this is course only runs
under windows.
Configuring the BI Administration Tool
Lets looks at the windows driver and connect via the BI
Administration tool first.
The download is a single MSI file called something like SimbaODBCDriverforGoogleBigQuery64_2.1.11.1011.msi
Run this file to start the installation.
You’ll probably get the usual windows
warnings about running an unrecognized app, along the lines of this below, you
will need to approve it’s running.
Then follow through the installation screens:
Once finished the Samba driver should appear in the ODBC
Data Source Admin drivers tab:
We can now configure a data source. Click on the System DSN
tab, click Add, then select the new Simba ODBC Driver for Google BigQuery
driver:
Then fill out the DSN Setup screen as follows:
Data Source Name – enter a suitable name. Note If OBIEE is
running under Linux this must not include spaces. Whilst spaces will work under
Windows, we will need to create an identical data source on the Linux server
too.
Description – optionally enter a description.
The OAuth Mechanism must be set to ‘Service Authentication’
Leave the User Authentication fields blank.
Now use browse to locate the private key file you downloaded
– either JSON or P12.
The ODBC driver comes with its own cacerts.pem certificate
file which will be used by default, so leave the ‘Use System Trust Store’ check
nox unticked unless you have specific certificates to use.
Lastly you need to choose the Project and Dataset. Both of
these are drop-down picklists which if the settings above have been entered
correctly will contact BigQuery and automatically provide the names of Project
and Dataset.
Finally click Test and, if successful, close the setup
screen.
Now we can open the BI Administration Tool.
For the record I’m using version 12.2.1.2.0 of the tool. I’m
using a fresh install of OBIEE, so my repository only contains the default
Sample App Lite content.
Right click in the Physical layer and select ‘New Database’
from the drop-down menu.
Enter a name for the database and set the Database type to
ODBC Basic.
Now click on the Features tab, there is one setting here we
need to change for BigQuery, this is the IDENTIFIER_QUOTE_CHAR feature, which
defaults to a double quote mark. This means OBIEE automatically puts double
quotes around all table and field names in SQL statements. BigQuery does not
accept this, so it needs to be removed. Blank out the double quote mark from
the Value field:
Click Ok to save the database.
Now right click on the new database and select ‘New Object’
-> ‘Connection Pool…’
Enter a name for the pool – I use the name of the Project in
BigQuery.
Set the Call interface to ‘ODBC 3.5’
Drop the Data Source name drop-down choose the ODBC data
source created earlier.
Untick the ‘Shared Login’ check box.
Leave all other settings at their default values for now.
Click OK to save.
Now right click on the new Connection Pool and select
‘Import Metadata’:
Next choose what to import. There are only Tables and Views
in BigQuery.
it will then connect to the Project in BigQuery and bring
back all data sets and tables/views within. Select what you need and use the
> & >> icons to add them to the repository:
The physical layer should then contain all of the selected
tables & views:
This all looks good, however there is a problem which will
prevent BigQuery from recognizing the tables in a SQL statement, this is that
BigQuery expects the table names to be prefixed with the DataSet name. OBIEE will not do that automatically, it will
simply use the tables names ‘as is’, so for example a simple query on
employee_name from the dim_employee table will send the following select:
Select employee_name from dim_employee
However, BigQuery will reject this, the table needs to be
prefixed with the dataset:
Select employee_name from PaulsSalesData.dim_employee
The only way to achieve this is to manually add the dataset
to the name of each table. Double click on the first table to edit its
properties:
And add the dataset name and a period to the name:
Repeat this for all tables:
You will now need to complete the setup of the repository
for this data source as you would any other data source. In the physical layer
add any joins between the tables – BigQuery does not contain keys that would
have allowed the import wizard to detect these automatically.
Then drag the physical schema into the Business Layer,
enable it and add any addition content (dimensions hierarchies, custom calcs
etc). Also do not forget to set the aggregation type for any numeric fields
(sum). Then drag that into the presentation layer:
Now save the repository, checking the consistency as you go.
All will always get one warning message about the features of our database in
the physical layer not matching the defaults – this is caused by the removal of
the double-quote marks against the IDENTIFIER_QUOTE_CHAR feature – so this can
be ignored.
Configuring OBIEE
Before we can login to OBIEE and try to run a report, we
need to configure the ODBC driver on the OBIEE server as well. OBIEE will use the data source name specified
in the repository connection pool, so once the drivers are installed we need to
create an identically named data source.
My OBIEE is running on a Linux VM. To install the ODBC
driver downloaded earlier the file, SimbaODBCDriverforGoogleBigQuery64_2.1.11.1011.tar.gz,
needs to be FTP’d to the server and either under the root user unpack with the
following command:
tar --directory=/opt -zxvf
SimbaODBCDriverforGoogleBigQuery64_2.1.11.1011.tar.gz
You then need to edit the simba.googlebigqueryodbc.ini to
configure a few settings.
vi
/opt/simba/googlebigqueryodbc/lib/64/simba.googlebigqueryodbc.ini
the original file contains:
Firstly, the DriverManagerEncoding needs changing to UTF-8.
the [DataDirect][ODBC lib] Unicode converter truncated
character errors if left at UTF-32.
ErrorMessagesPath needs the $(INSTALLDIR) replacing with the
actual directory, in my case /opt/simba/googlebigqueryodbc/ErrorMessages
LogLevel can be left at 0, which is no logging or increased.
I’ve set mine to 4, which is general information
LogPath need setting to appropriate directory
Also, LogCount can be added to restrict the number of log
files to keep.
So, my ini file looks like this:
There is a PDF document explaining all the configuration
settings here:
/opt/simba/googlebigqueryodbc/Simba ODBC Driver for Google
BigQuery Install and Configuration Guide.pdf
Next the BigQuery service account private key file
downloaded earlier needs to be FTP’d up to the server. You can place this
anywhere as it will be referred to via its full path. Make sure to upload this
under the user that Oracle runs under, not root, otherwise OBIEE may not have
access to it.
I’ve put mine in a new directory /u01/BigQueryDriver/ along
with the ODBC driver installer run earlier.
Now we can setup our ODBC data source. This needs to be done
OBIEEs odbc.ini file located under <obiee_home>/user_projects/domains/bi/config/fmwconfig/bienv/core/odbc.ini
Edit this file, for example with vi, in my case:
vi /u01/app/obiee12214/user_projects/domains/bi/config/fmwconfig/bienv/core/odbc.ini
First under the [ODBC Data Sources] section add a new record
for our data source using the same name as above when creating the ODBC data
source under windows for the BI Administration Tool. In my case I’ve added the
line:
BigQuery_PaulsSalesData=Simba ODBC Driver for Google
BigQuery
Then below create a new section titled the data source name
“[BigQuery_PaulsSalesData]” with the following settings:
Driver=/opt/simba/googlebigqueryodbc/lib/64/libgooglebigqueryodbc_sb64.so
This must point to the location
of the libgooglebigqueryodbc_sb64.so file installed with the simba driver. It’s
probably under /opt/simba/googlebigqueryodbc/lib/64, but check first.
Catalog=stately-equinox-206409
This is
the Project ID from BigQuery
OAuthMechanism=0
This
sets the authentication method to use service accounts.
Email=obieeconnection@stately-equinox-206409.iam.gserviceaccount.com
This is
the email address of our service account
KeyFilePath=/u01/BigQueryDriver/stately-equinox-206409-3e3df8089ed6.json
This is
the full path to our private key file for the service account
SQLDialect=1
The SQLDialect must set to 1,
which indicates modern standard SQL. The default is 0 which would be legacy SQL
which would not work.
LogLevel=4
Logging
level, same as the odbc driver ini file above
LogPath=/u01/simba_logs
Log file
path, same as the odbc driver ini file above
LogFileCount=10
Log File Count, same as the odbc
driver ini file above
LogFileSize=200
Maximum
size of the log files.
The last four log settings you can set to your own
preference.
So, my odbc ini file looks like this:
We can now perform a test of this using the OBIEE nqcmd.sh
(nqcmd.bat on windows) command.
First change to the directory for this command:
cd <oracle_home>/user_projects/domains/bi/bitools/bin
or in my case:
cd /u01/app/obiee12214/user_projects/domains/bi/bitools/bin
now create a text file called bigquery.sql and insert a
simple select, for example:
select sum(REVENUE) from PaulsSalesData.SALESDATA
then run the following command:
./nqcmd.sh -d BigQuery_PaulsSalesData -u weblogic -p
Password1 -s bigquery.sql -o bigquery.txt
This parameters are as follows:
-d this is the odbc data source name
-u this is a user in OBIEE, I’m using
the weblogic admin user
-p the password for this user
-s the name of the SQL file created
above
-o the name of a text file to output the
results to.
When you run this, you should find that the output file
contains the select statement (which it displays twice for some reason) and the
results returned from BigQuery:
If you see any errors here then something has gone wrong
either with the connection to BigQuery, the ODBC data source settings, or
possibly even your SQL (try running this directly in the BigQuery web site to
check it’s ok)
I came across three errors when I first ran this:
“[Simba][BigQuery]
(100) Error interacting with REST API: Peer certificate cannot be authenticated
with given CA certificates
Connect open failed”
There could be many reasons why a certificate could not be
authenticated. The best option is to open a browser on the server (I already
had firefox installed on my linux server) and connect to the BigQuery cloud web
site. This should show any certificate problems and, hopefully, the reason. In
my case the certificate error stated that the current certificate in the web
site wasn’t valid a until a certain date.
I hadn’t noticed before, but the date on my linux VM wasn’t
current and was before the valid date of the certificate, so I manually reset
it:
date -s "26 JUN 2018 14:50:00"
This led to the second error:
[Simba][BigQuery]
(80) Authentication failed: invalid_grant
It turns out that manually setting the date/time isn’t good
enough, the time sync between your server and the BigQuery server must be very
accurate. I decided to use the linux NTP package to sync properly – I can’t do
into how to do that here, but your linux admin guys should be able to help, or
just google it!
/usr/sbin/ntpdate pool.ntp.org
The last error I had was a simple permissions issue in
BigQuery, I hadn’t give the user both roles mentioned earlier on:
[Simba][BigQuery]
(100) Error interacting with REST API: Access Denied: Project
stately-equinox-206409: The user
obieeconnection@stately-equinox-206409.iam.gserviceaccount.com does not have
bigquery.jobs.create permission in project stately-equinox-206409.
Statement preparation
failed
Once your test query is running correctly you are now ready
to login to OBIEE and start creating reports!
Here I’ve created a new analysis on my PaulsSalesData
subject area. Note that the tables do not contain the PaulsSalesData. Prefix as
I removed them from the presentation layer, it’s only the database layer they
are required in.
And this can be displayed any way I want:
If report is running (i.e not erroring) but not bring back
any data or the data does not match what you expect, you can extract the SQL
OBIEE has built and run it directly in BigQuery. In OBIEE go to the
Administration tab (you’ll need to be logged in under an account with
Administrator privileges) then click the ‘Manage Sessions’ link:
This displays all session details for all users, so it could
be quite a long list, but towards the bottom you’ll find your query by looking
in the ‘user’ field:
The Statement in this screen is meaningless as this is the
query the OBIEE front end has sent to the BI Server in the background. We need
the SQL this generates to send to BigQuery. Click on the ‘View log’ link in the
Action column.
This displays a lot of information about the BI server
query, however towards the bottom you’ll find the ‘standard’ SQL select
statement sent to BigQuery. Copy this and login to BigQuery.
Create a new Query in BigQuery, paste in the select command
and run it to see the results:
If you get no data, or incorrect data, you can investigate
it here in BigQuery and if necessary alter the database details in the BI Administration
tool to accommodate your findings.
If you get an error along the lines of:
Then it’s probably because your BigQuery session is set to
use legacy SQL.
Click the ‘Show Options’ button and untick the ‘Use Legacy
SQL’ option then run the query again.
That’s it! Your now ready to build your reports and
dashboards in OBIEE on your BigQuery data.
However that may not be the end of the story, given the
simplistic structure of BigQuery datasets and the charging method of ‘pay per
query & quantity’ you may want to investigate ways of manipulating the data
in BigQuery (via views and summary tables) and fine tune the caching options in
OBIEE to minimize the data extracted from BigQuery, especially if you have lots
of analyses on dashboards frequently viewed by many users.
Enjoy!