If your Hadoop vendor is Cloudera, they will want you to use a Hyper SQL database for your sqoop metastore. It's ok and all, but if you already have MySQL or Postgres installed on your cluster, its just 1 more database for you to maintain. If you don't want to use Hyper SQL as your metastore, and have Postgres installed, you can follow the steps in this post for getting your sqoop metastore up and running on Postgres. (Just keep in mind, that if something happens, Cloudera will not help you since this currently not supported).
su postgres
Then create a database that will act as our sqoop metastore.
psql -c "create database sqoop"
While still in the Postgres shell, create a new Role to operate under:
CREATE ROLE sqoop WITH PASSWORD 'test';
Change
the owner of the sqoop database to sqoop:
ALTER DATABASE sqoop OWNER TO sqoop;
Make
sure the role can login:
ALTER ROLE sqoop WITH LOGIN;
Verify:
\list
Now connect to the sqoop database:
\connect sqoop
Create a table:
CREATE TABLE SQOOP_ROOT (
VERSION
INT,
PROPNAME
VARCHAR(128) NOT NULL,
PROPVAL
VARCHAR(256),
CONSTRAINT SQOOP_ROOT_UNQ UNIQUE (VERSION, PROPNAME)
);
Insert storage data into new table:
INSERT INTO SQOOP_ROOT
VALUES(
NULL,
'sqoop.hsqldb.job.storage.version',
'0';
Insert job info into new table:
INSERT INTO SQOOP_ROOT
VALUES(
0,
'sqoop.hsqldb.job.info.table',
'SQOOP_SESSIONS'
);
Verify:
SELECT * FROM SQOOP_ROOT;
Create
table SQOOP_SESSIONS.
CREATE TABLE
SQOOP_SESSIONS(
JOB_NAME VARCHAR(64)
,
PROPNAME VARCHAR(128)
,
PROPVAL VARCHAR(1024)
,
PROPCLASS VARCHAR(32)
,
CONSTRAINT
SQOOP_SESSIONS_unq UNIQUE
(JOB_NAME ,
PROPNAME ,
PROPCLASS)
);
Grant
table perms:
GRANT ALL PRIVILEGES ON SQOOP_ROOT TO sqoop;
GRANT ALL PRIVILEGES ON SQOOP_SESSIONS TO
sqoop;
''
Verify
port postgres is running on (In this example it was running on 5432):
SELECT *
FROM pg_settings WHERE name = 'port';
Log
into Cloudera Manager and go into the Sqoop 1 Client:
Figure 1. Cloudera Manager Sqoop Client |
In the Sqoop 1 Client screen, click
on configuration and filter on:
Sqoop
1 Client Client Advanced Configuration Snippet (Safety Valve) for
sqoop-conf/sqoop-site.xml
In that section add these 4 configs:
Name: sqoop.metastore.client.enable.autoconnect
Value: false
Description: If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided.
Name: sqoop.metastore.client.autoconnect.url
Value: jdbc:postgresql://<your postgres server>:5432/sqoop
Description: Connection to the sqoop metastore database
Name: sqoop.metastore.client.autoconnect.username
Value: sqoop
Description: Sqoop metastore database user name
Name: sqoop.metastore.client.autoconnect.password
Value: test
Description: Sqoop metastore database password
Figure 2. Sqoop Client Config |
Save changes and go back to Cloudera Manager's main
screen. You should see a new icon next to the Sqoop 1 Client:
Figure 3. Stale Sqoop Config |
Click
that and then click deploy changes on the next screen. You should now be good to go. Make
sure a postgres JDBC driver exists in: /var/lib/sqoop on whatever server is running your sqoop jobs.
Create a test job to make sure everything is working. Here is a sample:
sqoop job --create
sample_job --meta-connect
"jdbc:postgresql://<postgres server>:5432/sqoop?user=sqoop&password=test" --
import --connect jdbc:oracle:thin://@<oracle server>:1521/servicename --username
user --password pass
--table SUPPORT.APC_CODE --as-avrodatafile
--target-dir /staging/hpm_support/apc_code --check-column mycolumn --incremental append --last-value 0 m 1
Verify job was created:
select * from SQOOP_SESSIONS;
To run the sample job:
sqoop job --exec sample_job --meta-connect
"jdbc:postgresql://<postgres server>:5432/sqoop?user=sqoop&password=test"
Remove
sample job:
sqoop job --delete sample_job --meta-connect
"jdbc:postgresql://<postgres server>:5432/sqoop?user=sqoop&password=test"
Your Postgres metastore should now be all set to go!
No comments:
Post a Comment