Tensorflow 错误集锦

栏目: 编程工具 · 发布时间: 5年前

内容简介:首先保证在一个IP为192.168.1.100的机器上启动ps或worker进程:因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。
2018-12-05 22:18:24.565303: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:24.565372: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:24.569212: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3376
2018-12-05 22:18:26.170901: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:26.170969: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:26.174856: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3330
2018-12-05 22:18:27.177003: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:27.177071: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:27.180980: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3331
2018-12-05 22:18:34.625459: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0
2018-12-05 22:18:34.625513: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:36.231936: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:36.231971: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:37.235899: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:37.235952: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0

首先保证 job_name,task_index,ps_hosts,worker_hosts 这四个参数都是正确的,考虑以下这种情况是不正确的:

在一个IP为192.168.1.100的机器上启动ps或worker进程:

--job_name=worker
--task_index=1
--ps_hosts=192.168.1.100:2222,192.168.1.101:2222
--worker_hosts=192.168.1.100:2223,192.168.1.101:2223

因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。

另外一种情况也会导致该问题的发生,从TensorFlow-1.4开始,分布式会自动使用环境变量中的代理去连接,如果运行的节点之间不需要代理互连,那么将代理的环境变量移除即可,在脚本的开始位置添加代码:

注意这段代码必须写在import tensorflow as tf或者import moxing.tensorflow as mox之前

import os
os.enrivon.pop('http_proxy')
os.enrivon.pop('https_proxy')

— 摘自( https://bbs.huaweicloud.com/blogs/463145f7a1d111e89fc57ca23e93a89f )


以上所述就是小编给大家介绍的《Tensorflow 错误集锦》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

JavaScript语言精粹

JavaScript语言精粹

Douglas Crockford / 赵泽欣、鄢学鹍 / 电子工业出版社 / 2009-4 / 35.00元

本书通过对JavaScript语言的分析,甄别出好的和坏的特性,从而提取出相对这门语言的整体而言具有更好的可靠性、可读性和可维护性的JavaScript的子集,以便你能用它创建真正可扩展的和高效的代码。 雅虎资深JavaScript架构师Douglas Crockford倾力之作。 向读者介绍如何运用JavaScript创建真正可扩展的和高效的代码。一起来看看 《JavaScript语言精粹》 这本书的介绍吧!

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具