本文共 8163 字,大约阅读时间需要 27 分钟。
本文主要描述了HDFS和RDBMS数据库之间的数据迁移。
一、创建Link这里主要描述下Oracle、PostgreSQL、MySQL以及HDFS的link创建。1.1 JDBC驱动安装从官网下载Oracle和PostgreSQL的jdbc驱动,拷贝到$SQOOP_HOME/server/lib,然后重启sqoop2服务器。我这里用到的Oracle jdbc驱动为ojdbc6.jar,PostgreSQL为postgresql-42.1.4.jar,MySQL为mysql-connector-java-5.1.44-bin.jar。1.2 创建HDFS Linksqoop:000> create link -c hdfs-connectorCreating link for connector with name hdfs-connectorPlease fill following values to create new link objectName: HDFS_LinkHDFS clusterURI: hdfs://192.168.120.96:9000Conf directory: /u01/hadoop/etc/hadoopAdditional configs:: There are currently 0 values in the map:entry# New link was successfully created with validation status OK and name HDFS_Link
1.3 创建MySQL Link
sqoop:000> create link -c generic-jdbc-connectorCreating link for connector with name generic-jdbc-connectorPlease fill following values to create new link objectName: MySQL_LinkDatabase connectionDriver class: com.mysql.jdbc.DriverConnection String: jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=falseUsername: rootPassword: *********Fetch Size: Connection Properties: There are currently 0 values in the map:entry# SQL DialectIdentifier enclose: New link was successfully created with validation status OK and name MySQL_Link
1.4 创建PostgreSQL link
sqoop:000> create link -c generic-jdbc-connectorCreating link for connector with name generic-jdbc-connectorPlease fill following values to create new link objectName: PostgreSQL_LinkDatabase connectionDriver class: org.postgresql.DriverConnection String: jdbc:postgresql://192.168.120.93:5432/rhndbUsername: rhnuserPassword: ******Fetch Size: Connection Properties: There are currently 0 values in the map:entry# SQL DialectIdentifier enclose: New link was successfully created with validation status OK and name PostgreSQL_Link
1.5 创建Oracle Link
sqoop:000> create link -c generic-jdbc-connectorCreating link for connector with name generic-jdbc-connectorPlease fill following values to create new link objectName: Oracle_linkDatabase connectionDriver class: oracle.jdbc.driver.OracleDriverConnection String: jdbc:oracle:thin:@192.168.120.121:1521:rhndbUsername: spwuserPassword: ******Fetch Size: Connection Properties: There are currently 0 values in the map:entry# SQL DialectIdentifier enclose: New link was successfully created with validation status OK and name Oracle_link
> 注意:关系型数据库在Identifier enclose这里,必须先空格再回车。
二、创建关系型数据库到HDFS作业2.1 MySQL To HDFS作业sqoop:000> create job -f MySQL_Link -t HDFS_LinkCreating job for links with from name MySQL_Link and to name HDFS_LinkPlease fill following values to create new job objectName: MySQL_To_HDFSDatabase sourceSchema name: smsqwTable name: tbMessageSQL statement: Column names: There are currently 0 values in the list:element# Partition column: Partition column nullable: Boundary query: Incremental readCheck column: Last value: Target configurationOverride null value: Null value: File format: 0 : TEXT_FILE 1 : SEQUENCE_FILE 2 : PARQUET_FILEChoose: 0Compression codec: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOMChoose: 1Custom codec: Output directory: /user/DataSourceAppend mode: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name MySQL_To_HDFS
2.2 PostgreSQL To HDFS作业
sqoop:000> create job -f PostgreSQL_Link -t HDFS_LinkCreating job for links with from name PostgreSQL_Link and to name HDFS_LinkPlease fill following values to create new job objectName: PostgreSQL_to_HDFSDatabase sourceSchema name: rhndbTable name: rhnpackagechangelogdataSQL statement: Column names: There are currently 0 values in the list:element# Partition column: id --如果表没有主键,就指定一个字段Partition column nullable: Boundary query: Incremental readCheck column: Last value: Target configurationOverride null value: Null value: File format: 0 : TEXT_FILE 1 : SEQUENCE_FILE 2 : PARQUET_FILEChoose: 0Compression codec: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOMChoose: 1Custom codec: Output directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATAAppend mode: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name PostgreSQL_to_HDFS
2.3 创建Oracle To HDFS作业
sqoop:000> create job -f Oracle_link -t HDFS_LinkCreating job for links with from name Oracle_link and to name HDFS_LinkPlease fill following values to create new job objectName: Oracle_To_HDFSDatabase sourceSchema name: spwuserTable name: RHNCHANNELPACKAGESQL statement: Column names: There are currently 0 values in the list:element# Partition column: Partition column nullable: Boundary query: Incremental readCheck column: Last value: Target configurationOverride null value: Null value: File format: 0 : TEXT_FILE 1 : SEQUENCE_FILE 2 : PARQUET_FILEChoose: 0Compression codec: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOMChoose: 1Custom codec: Output directory: /user/DataSource/Oracle/RHNCHANNELPACKAGEAppend mode: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name Oracle_To_HDFS
> 注意:如果表没有主键,则需在"Partition column: "处指定一个字段。
三、创建HDFS To 关系型数据库作业3.1 HDFS To MySQL作业sqoop:000> create job -f HDFS_Link -t MySQL_LinkCreating job for links with from name HDFS_Link and to name MySQL_LinkPlease fill following values to create new job objectName: HDFS_To_MySQLInput configurationInput directory: /user/DataSource/MySQL/tbDeliverMsgOverride null value: Null value: Incremental importIncremental type: 0 : NONE 1 : NEW_FILESChoose: 0Last imported date: Database targetSchema name: smsqwTable name: tbDeliverMsg2Column names: There are currently 0 values in the list:element# Staging table: Clear stage table: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name HDFS_To_MySQL
3.2 HDFS To PostgreSQL作业
sqoop:000> create job -f HDFS_Link -t PostgreSQL_LinkCreating job for links with from name HDFS_Link and to name PostgreSQL_LinkPlease fill following values to create new job objectName: HDFS_To_PostgreSQLInput configurationInput directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATAOverride null value: Null value: Incremental importIncremental type: 0 : NONE 1 : NEW_FILESChoose: 0Last imported date: Database targetSchema name: publicTable name: rhnpackagechangelogdataColumn names: There are currently 0 values in the list:element# Staging table: Clear stage table: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name HDFS_To_PostgreSQL
3.3 HDFS To Oracle作业
sqoop:000> create job -f HDFS_Link -t Oracle_linkCreating job for links with from name HDFS_Link and to name Oracle_linkPlease fill following values to create new job objectName: HDFS_To_OracleInput configurationInput directory: /user/DataSource/Oracle/QRTZ_CRON_TRIGGERSOverride null value: Null value: Incremental importIncremental type: 0 : NONE 1 : NEW_FILESChoose: 0Last imported date: Database targetSchema name: spwuserTable name: QRTZ_CRON_TRIGGERSColumn names: There are currently 0 values in the list:element# Staging table: Clear stage table: Throttling resourcesExtractors: Loaders: Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name create job -f HDFS_Link -t Oracle_link另外,可以查看指定作业的详细信息,如下:四、运行作业4.1 MySQL_To_HDFS4.2 PostgreSQL_To_HDFS4.3 Oracle_To_HDFS4.4 HDFS_To_MySQL4.5 HDFS_To_PostgreSQL4.6 HDFS_To_Oracle参考文献:1、
转载地址:http://mwkla.baihongyu.com/