Load Data Using Python to Hive
安装Pyhs2
在CentOS上安装Pyhs2流程如下:
从https://pypi.python.org/pypi/ez_setup 下载ez_setup的tar包
- tar -vzxf ez_setup-**.tar.gz - python ez_setup.py - easy_install pip - yum install gcc-c++ - yum install cyrus-sasl-devel.x86_64 - yum install python-devel.x86_64 - pip install pyhs2
解决No Mechanism Available问题
thrift.transport.TTransport.TTransportException:
Could not start SASL: Error in sasl_client_start
(-4) SASL(-4): no mechanism available:
No worthy mechs found
检查是否安装所有配置所需要的包
rpm -qa | grep cyrus
至少需要有以下包
cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-plain-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-md5-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-devel-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-gssapi-2.1.23-15.el6_6.2.x86_64
没有的话,yum install 安装所需的包
检查hive-site.xml配置
到 /etc/hive/conf/hive-site.xml 查看配置
其中
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>
可以把Value值改成NOSASL/PLAIN,etc
Python代码
import pyhs2
引入pyhs2 lib
hive_client = HiveClient(db_host='192.168.100.173', port=10000, user='user',
password='password', database='default', authMechanism='PLAIN')
连接到Hive服务器,其中host为hive主机的ip地址,port默认为10000,用户名密码不能为空,authMechanism的值必须与hive-site.xml配置文件中hive.server2.authentication的值相同。
class HiveClient:
# create connection to hive server2
def __init__(self, db_host, user, password, database, port=10000, authMechanism="PLAIN"):
self.conn = pyhs2.connect(host=db_host,
port=port,
authMechanism=authMechanism,
user=user,
password=password,
database=database,
)
def query(self, sql):
with self.conn.cursor() as cursor:
cursor.execute(sql)
return cursor.fetch()
def loaddata(self, sql):
with self.conn.cursor() as cursor:
cursor.execute(sql)
def close(self):
self.conn.close()
def main():
# 注意authMechanism初始的时候是PLAIN,最好的解决方案是加上认证模块
hive_client = HiveClient(db_host='192.168.100.171',
port=10000, user='root',
password='Qianxin123',
database='default',
authMechanism='PLAIN')
print "Connect to Hive!"
# 测试select语句
sql = 'select * from country1;'
print sql
try:
fetch = hive_client.query(sql)
for i in fetch:
print i
sql = "insert into country1 values(1,/"hello/")"
hive_client.query(sql)
sql = 'select * from country1'
fetch = hive_client.query(sql)
for i in fetch:
print i
except Exception as e:
print e
hive_client.close()
if __name__ == '__main__':
main()