hi,欢迎访问本站!
当前位置: 首页学习笔记正文

sqoop的基本语法详解及可能遇到的错误

用户投稿 学习笔记 26阅读
0 sqoop介绍

Apache Sqoop是专为Apache Hadoop和结构化数据存储如关系数据库之间的数据转换工具的有效工具。你可以使用Sqoop从外部结构化数据存储的数据导入到Hadoop分布式文件系统或相关系统如Hive和HBase。相反,Sqoop可以用来从Hadoop的数据提取和导出到外部结构化数据存储如关系数据库和企业数据仓库。 Sqoop专为大数据批量传输设计,能够分割数据集并创建Hadoop任务来处理每个区块。 sqoop的安装和下载可参考该地址

1 使用场景

Sqoop底层其实就是MapReduce,那使用Sqoop必然会存在数据延迟的问题,我个人建议哈,在数据量不是很大或者实时性要求不高的情况下,使用Sqoop还是很方便的哈,直接通过配置就可以各种数据的导入导出。如果大家学习过Spark应该知道其实Spark SQL用起来也是很方便的,给我们提供了外部数据源,方便我们对各种数据类型的读写哈。 所以怎么用看大家的自己选择,没有最好的,只有最适合自己的~。

官网 使用手册

2 查看帮助命令

查看命令帮助(sqoop help )

[hadoop@zhangyu lib]$ sqoop helpusage: sqoop COMMAND [ARGS]Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version informationSee 'sqoop help COMMAND' for information on a specific command.这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询 3 list-databases [hadoop@zhangyu lib]$ sqoop help list-databases –connect jdbc:mysql://hostname:port/database指定mysql数据库主机名和端口号和数据库名(默认端口号为3306);–username : root 指定数据库用户名–password :123456 指定数据库密码 [hadoop@zhangyu lib]$ sqoop list-databases \> --connect jdbc:mysql://localhost:3306 \> --username root \> --password 123456结果:information_schemabasic01mysqlperformance_schemasqooptest 4 list-tables [hadoop@zhangyu lib]$ sqoop list-tables \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root \> --password 123456结果:stu 5 将mysql导入HDFS中(import):

(默认导入当前用户目录下/user/用户名/表名) 说到这里扩展一个小知识点:hdfs dfs -ls和hdfs dfs -ls \的区别。(自己动手去测试下~~~)

sqoop import --connect jdbc:mysql://localhost/database --username root --password 123456 --table example –m 1 –table : example mysql中即将导出的表-m 1 指定启动一个map进程,如果表很大,可以启动多个map进程,默认是4个

这里可能会出现两个错误,如下:

第一个错误 18/01/14 16:01:19 ERROR tool.ImportTool: Error during import: No primary key could be found for table stu. Please specify one with --split-by or perform a sequential import with '-m 1'. 提示可以看出,在我们从mysql中导出的表没有设定主键,提示我们使用把--split-by或者把参数-m设置为1,这里大家会不会问到,这倒是是为什么呢? Sqoop通可以过–split-by指定切分的字段,–m设置mapper的数量。通过这两个参数分解生成m个where子句,进行分段查询。split-by 根据不同的参数类型有不同的切分方法,如表共有100条数据其中id为int类型,并且我们指定–split-by id,我们不设置map数量使用默认的为四个,首先Sqoop会取获取切分字段的MIN()和MAX()即(–split -by),再根据map数量进行划分,这是字段值就会分为四个map:(1-25)(26-50)(51-75)(75-100)。根据MIN和MAX不同的类型采用不同的切分方式支持有Date,Text,Float,Integer, Boolean,NText,BigDecimal等等。所以,若导入的表中没有主键,将-m 设置称1或者设置split-by,即只有一个map运行,缺点是不能并行map录入数据。(注意,当-m 设置的值大于1时,split-by必须设置字段) 。split-by即便是int型,若不是连续有规律递增的话,各个map分配的数据是不均衡的,可能会有些map很忙,有些map几乎没有数据处理的情况。 第二个错误 Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742) at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369) at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236)Caused by: java.lang.ClassNotFoundException: org.json.JSONObject at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 15 more

这里我们需要导入java-json.jar包,下载地址,把java-json.jar添加到…/sqoop/lib目录

说了那么多来看我们的第一个导入语句: [hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu

生成的日志信息大家一定要好好理解

我们查看HDFS上的文件

[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stuFound 4 items-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 17:07 /user/hadoop/stu/_SUCCESS-rw-r--r-- 1 hadoop supergroup 11 2018-01-14 17:07 /user/hadoop/stu/part-m-00000-rw-r--r-- 1 hadoop supergroup 7 2018-01-14 17:07 /user/hadoop/stu/part-m-00001-rw-r--r-- 1 hadoop supergroup 9 2018-01-14 17:07 /user/hadoop/stu/part-m-00002[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/"part*"1,zhangsan2,lisi3,wangwu 加上参数m [hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu -m 1 第三个错误 这里第三个错误,是我在把Postgres中的数据导入到HDFS中出现的错误: [hdfs@hik003 /apps/sqoop]$ sqoop import \> --connect jdbc:postgresql://localhost:5432/test \> --username root \> --password 123456 \> --table test_info \> --target-dir /test/ \> --mapreduce-job-name FromPgSQL2HDFS \> -m 1 Warning: /opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.19/05/17 20:38:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.119/05/17 20:38:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.19/05/17 20:38:15 INFO manager.SqlManager: Using default fetchSize of 100019/05/17 20:38:15 INFO tool.CodeGenTool: Beginning code generation19/05/17 20:38:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "camera_govern_info" AS t LIMIT 119/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 111119/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 111119/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 111119/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 111119/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom19/05/17 20:38:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1389) at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1414) at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1540) at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1503) at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1958) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1749) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)https://yq.aliyun.com/articles/578819

这里就是字段类型的问题了,postgresql数据库中geometry类型的字段,是用于存放经纬度的字段类型,在转化时出现了错误。改变数据库中的字段类型。

这里大家可能也会出现一个错误,在hdfs上已经存,错误如下:

18/01/14 17:52:47 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.137.200:9000/user/hadoop/stu already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325) at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) 删除目标目录后在导入,并且指定mapreduce的job的名字 参数:--delete-target-dir --mapreduce-job-name [hadoop@zhangyu lib]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --table stu \> -m 1 导入到指定目录

参数:--target-dir /directory

[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -password 123456 --table stu -m 1 --target-dir /sqoop/

查看HDFS上的文件

[hadoop@zhangyu lib]$ hdfs dfs -ls /sqoopFound 2 items-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 18:07 /sqoop/_SUCCESS-rw-r--r-- 1 hadoop supergroup 27 2018-01-14 18:07 /sqoop/part-m-00000[hadoop@zhangyu lib]$ hdfs dfs -cat /sqoop/part-m-000001,zhangsan2,lisi3,wangwu 指定字段之间的分隔符 参数--fields-terminated-by [hadoop@zhangyu lib]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table stu \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 HDFS上查询结果:[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stu/ Found 2 items-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 19:47 /user/hadoop/stu/_SUCCESS-rw-r--r-- 1 hadoop supergroup 27 2018-01-14 19:47 /user/hadoop/stu/part-m-0000 [hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000 1 zhangsan2 lisi3 wangwu(字段之间变为空格) 如果表中的字段为null转化为0

参数--null-non-string

–null-string含义是 string类型的字段,当Value是NULL,替换成指定的字符–null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符先 导入薪资表[hadoop@zhangyu lib]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table sal \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1查询结果:[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/sal/part-m-00000zhangsan 1000lisi 2000wangwu null加上参数`--null-string[hadoop@zhangyu lib]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table sal \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-string 0查看结果[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/sal/part-m-00000zhangsan 1000lisi 2000wangwu 0 导入表中的部分字段 参数--columns [hadoop@zhangyu ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table stu \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-string 0 \> --columns "name" 查询结果:[hadoop@zhangyu ~]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000zhangsanlisiwangwu 按条件导入数据 参数--where [hadoop@zhangyu ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table stu \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-string 0 \> --columns "name" \> --target-dir STU_COLUMN_WHERE \> --where 'id<3'查询结果: zhangsanlisi 按照sql语句进行导入 参数--query 使用--query关键字,就不能使用--table和--columns 自定义sql语句的where条件中必须包含字符串 $CONDITIONS,$CONDITIONS是一个变量,用于给多个map任务划分任务范 围; sqoop import \--connect jdbc:mysql://localhost:3306/sqoop \--username root --password 123456 \--mapreduce-job-name FromMySQL2HDFS \--delete-target-dir \--fields-terminated-by '\t' \-m 1 \--null-string 0 \--target-dir STU_COLUMN_QUERY \--query "select * from stu where id>1 and \$CONDITIONS" (或者quer使用这种格式:--query 'select * from emp where id>1 and $CONDITIONS')

结果:

2 lisi3 wangwu 6 在文件中执行 创建文件sqoop-import-hdfs.txt [hadoop@zhangyu data]$ vi sqoop-import-hdfs.txt import--connectjdbc:mysql://localhost:3306/sqoop--usernameroot--password123456--tablestu--target-dir STU_option_file 执行 [hadoop@zhangyu data]$ sqoop --option-file /home/hadoop/data/sqoop-import-hdfs.txt查询结果:[hadoop@zhangyu data]$ hdfs dfs -cat STU_option_file/"part*"1,zhangsan2,lisi3,wangwu 7 eval

查看帮助命令对与该命令的解释为: Evaluate a SQL statement and display the results,也就是说执行一个SQL语句并查询出结果。

[hadoop@zhangyu data]$ sqoop eval \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --query "select * from stu" Warning: /opt/software/sqoop/../hbase does not exist! HBase imports will fail.Please set $HBASE_HOME to the root of your HBase installation.Warning: /opt/software/sqoop/../hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /opt/software/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.Warning: /opt/software/sqoop/../zookeeper does not exist! Accumulo imports will fail.Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.18/01/14 21:35:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.018/01/14 21:35:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.18/01/14 21:35:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.--------------------------------------| id | name | --------------------------------------| 1 | zhangsan | | 2 | lisi | | 3 | wangwu | -------------------------------------- 8 HDFS数据导出到MySQL(Hive中的数据导入到MySQL)

导出HDFS上的sal数据,查询数据:

[hadoop@zhangyu data]$ hdfs dfs -cat sal/part-m-00000zhangsan 1000lisi 2000wangwu 0 在执行导出语句前先创建sal_demo表(不创建表会报错): mysql> create table sal_demo like sal; 导出语句: [hadoop@zhangyu data]$ sqoop export \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root \> --password 123456 \> --table sal_demo \> --input-fields-terminated-by '\t'\> --export-dir /user/hadoop/sal/ –table sal_demo :指定导出表的名称;–input-fields-terminated-by:可以用来指定hdfs上文件的分隔符,默认是逗号(查询数据室可以看出我是用的是\t,所以这里指定为\t ,这里大家小心可能因为分隔符的原因报错)–export-dir :导出数据的目录。 结果:mysql> select * from sal_demo;+----------+--------+| name | salary |+----------+--------+| zhangsan | 1000 || lisi | 2000 || wangwu | 0 |+----------+--------+3 rows in set (0.00 sec)

(如果在导入一次会追加在表中)

插入中文乱码问题 sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table sal -m 1 --export-dir /user/hadoop/sal/ 指定导出的字段 --columns <col,col,col...> [hadoop@zhangyu data]$ sqoop export \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root \> --password 123456 \> --table sal_demo3 \> --input-fields-terminated-by '\t' \> --export-dir /user/hadoop/sal/ \> --columns name

查询结果:

mysql> select * from sal_demo3 -> ;+----------+--------+| name | salary |+----------+--------+| zhangsan | NULL || lisi | NULL || wangwu | NULL |+----------+--------+3 rows in set (0.00 sec) 9 MySQL的中的数据导入到Hive中 执行导入语句 [hadoop@zhangyu ~]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table stu \> --create-hive-table \> --hive-database hive \> --hive-import \> --hive-overwrite \> --hive-table stu_import \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-non-string 0

–create-hive-table :创建目标表,如果有会报错; –hive-database:指定hive数据库; –hive-import :指定导入hive(没有这个条件导入到hdfs中); –hive-overwrite :覆盖; –hive-table stu_import :指定hive中表的名字,如果不指定使用导入的表的表名。

这里可能会报错,错误如下:

18/01/15 01:29:28 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.18/01/15 01:29:28 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44) ... 12 more

网上找的资料基本都在说配置个人环境变量,并没有卵用,到hive目录的lib下拷贝几个jar包,问题就解决了!

[hadoop@zhangyu lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /opt/software/sqoop/lib/[hadoop@zhangyu lib]$ cd hive-shims* /opt/software/sqoop/lib/ 查看hive中导入的数据 hive> show tables;OKstu_importTime taken: 0.067 seconds, Fetched: 1 row(s)hive> select * from emp_import > ;OK1 zhangsan2 lisiw 3 wangwu

导入Hive不建议大家使用–create-hive-table,建议事先创建好hive表 使用create创建表后,我们可以查看字段对应的类型,发现有些并不是我们想要的类型,所以我们要事先创建好表的结构再导入数据。

导入到hive指定分区 --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive 示例: [hadoop@zhangyu lib]$ sqoop import \> --connect jdbc:mysql://localhost:3306/sqoop \> --username root --password 123456 \> --table stu \> --create-hive-table \> --hive-database hive \> --hive-import \> --hive-overwrite \> --hive-table stu_import1 \> --mapreduce-job-name FromMySQL2HDFS \> --delete-target-dir \> --fields-terminated-by '\t' \> -m 1 \> --null-non-string 0 \> --hive-partition-key dt \> --hive-partition-value "2018-08-08" hive上进行查询 hive> select * from stu_import1;OK1 zhangsan 2018-08-082 lisi 2018-08-083 wangwu 2018-08-08Time taken: 0.121 seconds, Fetched: 3 row(s) 10 sqoop job的使用

就是把sqoop执行的语句变成一个job,并不是在创建语句的时候执行,你可以查看该job,可以任何时候执行该job,也可以删除job,这样就方便我们进行任务的调度

--create <job-id> 创建一个新的job.--delete <job-id> 删除job--exec <job-id> 执行job--show <job-id> 显示job的参数--list 列出所有的job 创建一个job sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/sqoop \--username root \--password 123456 \--table sal_demo3 \-m 1 \--delete-target-dir 查看可用的job [hadoop@zhangyu lib]$ sqoop job --listAvailable jobs: person_job1 执行person_job完成导入 [hadoop@zhangyu lib]$ sqoop job --exec person_job1[hadoop@zhangyu lib]$ hdfs dfs -lsFound 6 itemsdrwxr-xr-x - hadoop supergroup 0 2018-01-14 20:40 EMP_COLUMN_WHEREdrwxr-xr-x - hadoop supergroup 0 2018-01-14 20:49 STU_COLUMN_QUERYdrwxr-xr-x - hadoop supergroup 0 2018-01-14 20:45 STU_COLUMN_WHEREdrwxr-xr-x - hadoop supergroup 0 2018-01-14 21:10 STU_option_filedrwxr-xr-x - hadoop supergroup 0 2018-01-14 20:24 saldrwxr-xr-x - hadoop supergroup 0 2018-01-15 03:08 sal_demo3

问题:执行person_job的时候,需要输入数据库的密码,怎么样能不输入密码呢?

配置sqoop-site.xml <property> <name>sqoop.metastore.client.record.password</name> <value>true</value> <description>If true, allow saved passwords in the metastore. </description></property> 最后我们看看常用脚本把 #!/bin/bash#Hive2PgIP=localhostDatabase=dufaultDriver=jdbc:postgresqlpg_connection=${Driver}://${IP}/${Database}pg_username=rootpg_password=123456hive_db=defaulthive_table=hive_tablehive_columns=id,name,age #需要导出的数据pg表名export_table_name=pg_test #需要导出的数据pg列名export_table_columns=id,name,age #需要导出到pg的数据的临时文件目录sqoop_export_data_dir=/tmp/sqoop/export/${export_table_name} hadoop dfs -rm -r ${sqoop_export_data_dir};#创建用于导出到pg的临时数据hive -v -e "use ${hive_db};insert overwrite directory '${sqoop_export_data_dir}'select${hive_columns}from ${hive_db}.${hive_table};";#先删除目的数据库的数据sqoop eval --connect ${pg_connection} \--username ${pg_username} \--password ${pg_password} \--verbose --query "delete from ${export_table_name}"; #再导出数据sqoop export --connect ${pg_connection} \--username ${pg_username} \--password ${pg_password} \--export-dir ${sqoop_export_data_dir} \--verbose --num-mappers 1 \--table ${export_table_name} \--columns ${export_table_columns} \--input-fields-terminated-by '\001' \--input-lines-terminated-by '\n' \--input-null-string '\\N' \--input-null-non-string '\\N'
标签:
声明:无特别说明,转载请标明本文来源!
发布评论
正文 取消