Tensorflow 错误集锦

栏目: 编程工具 · 发布时间: 6年前

内容简介:首先保证在一个IP为192.168.1.100的机器上启动ps或worker进程:因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。
2018-12-05 22:18:24.565303: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:24.565372: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:24.569212: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3376
2018-12-05 22:18:26.170901: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:26.170969: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:26.174856: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3330
2018-12-05 22:18:27.177003: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:27.177071: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:27.180980: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3331
2018-12-05 22:18:34.625459: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0
2018-12-05 22:18:34.625513: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:36.231936: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:36.231971: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:37.235899: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:37.235952: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0

首先保证 job_name,task_index,ps_hosts,worker_hosts 这四个参数都是正确的,考虑以下这种情况是不正确的:

在一个IP为192.168.1.100的机器上启动ps或worker进程:

--job_name=worker
--task_index=1
--ps_hosts=192.168.1.100:2222,192.168.1.101:2222
--worker_hosts=192.168.1.100:2223,192.168.1.101:2223

因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。

另外一种情况也会导致该问题的发生,从TensorFlow-1.4开始,分布式会自动使用环境变量中的代理去连接,如果运行的节点之间不需要代理互连,那么将代理的环境变量移除即可,在脚本的开始位置添加代码:

注意这段代码必须写在import tensorflow as tf或者import moxing.tensorflow as mox之前

import os
os.enrivon.pop('http_proxy')
os.enrivon.pop('https_proxy')

— 摘自( https://bbs.huaweicloud.com/blogs/463145f7a1d111e89fc57ca23e93a89f )


以上所述就是小编给大家介绍的《Tensorflow 错误集锦》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Processing语言权威指南

Processing语言权威指南

Casey Reas、Ben Fry / 张静 / 电子工业出版社 / 2013-10-1 / 139.00

本书介绍了可视化艺术中的计算机编程概念,对开源编程语言Processing作了非常详尽的阐述。学生、艺术家、设计师、建筑师、研究者,以及任何想编程实现绘画、动画和互动的人都可以使用它。书中的大部分章节是短小的单元,介绍Processing的语法和基本概念(变量、函数、面向对象编程),涵盖与软件相关的图像处理、绘制,并且给出了大量简短的原型程序,配以相应的过程图像与注释。书中还有一些访谈文章,与动画......一起来看看 《Processing语言权威指南》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换