避障及存在的问题
abstract Welcome to my blog, enter password to read.
Read more
move_base中的多线程

move_base.cpp中用到的多线程:

1
2
3
4
5
boost::condition_variable_any  planner_cond_;

boost::thread* planner_thread_;

boost::recursive_mutex planner_mutex_;

代价地图里有个锁mutex_t,其实就是boost::recursive_mutex,膨胀层里还用到了读写锁boost::shared_mutex* access_

递归锁

move_base里用的锁主要是boost::recursive_mutex planner_mutex_,而且是和boost::unique_lock, boost::recursive_mutex::scoped_lock 搭配使用。

使用到planner_mutex_的函数有planThread, executeCb, executeCycle(6次), resetState。其他3个函数,每个都只使用了一次planner_mutex_

只有一次是在生成路径后,交互变量。其他全是和planThread有关的,基本是对runPlanner赋值,比如

1
2
3
4
5
boost::unique_lock<boost::recursive_mutex> lock(planner_mutex_);
planner_goal_ = goal;
runPlanner_ = true;
planner_cond_.notify_one();
lock.unlock();


经过我大量修改,move_base在有导航任务时,如果ctrl+C退出,终端必定会出现下面的错误,虽然毫无影响,但是毕竟不优雅
move_base 退出时报错死锁
这个错误是在有锁的情况下,仍然加锁。 也就是多调用了lock()函数。

搜索move_base,发现只有7个lock()。逐个查看,由于是退出时的报错,所以不可能是makePlan的那个,很容易发现是planThread函数最后的那个。加上判断即可解决

1
2
if( !lock.owns_lock() )
lock.lock();

对于其他使用lock()的地方,最好也加上这个判断。


Detected jump back in time
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[ WARN] [1709619594.875225471]: Control loop missed its desired rate of 300.00 Hz... the loop actually took 0.0087 seconds

// joint_state 报错,不是重点
[ WARN] [1709619595.976339929]: Moved backwards in time (probably because ROS clock was reset), re-publishing joint transforms!
[ WARN] [1709619595.982125868]: Detected jump back in time of 1.05772s. Clearing TF buffer.
[ WARN] [1709619595.983936734]: Detected jump back in time of 1.05592s. Clearing TF buffer.
[ERROR] [1709619595.987547945]: Exception in PurePursuitPlanner::transformPose : Lookup would require extrapolation at time 1709619597.011688232, but only time 1709619596.080649853 is in the buffe, when looking up transform from frame [map] to frame [cutter_head]
[ERROR] [1709619595.987830086]: transform GlobalPoseToLocal failed
[ERROR] [1709619595.988800520]: Exception in PurePursuitPlanner::transformPose : Lookup would require extrapolation at time 1709619597.011688232, but only time 1709619596.080649853 is in the buffe, when looking up transform from frame [map] to frame [cutter_head]
[ERROR] [1709619596.028980155]: transform GlobalPoseToLocal failed
[ERROR] [1709619596.032062451]: Connectivity Error looking up robot pose: Could not find a connection between 'map' and 'base_link' because they are not part of the same tree.Tf has two or more unconnected trees.


[ERROR] [1709619596.032538970]: in_pose frame is empty!
[ERROR] [1709619596.032780641]: Exception in PurePursuitPlanner::transform global plan: Unable to transform robot pose into global plan's frame
[ WARN] [1709619596.033289244]: Unable to get starting pose of robot, so unable to create global plan
[ WARN] [1709619596.034867606]: Unable to get starting pose of robot, so unable to create global plan
[ WARN] [1709619596.034939482]: move_base didn't receive global plan, mower's recoverying and will try again ......

[ WARN] [1709619596.036663888]: Clearing costmap to unstuck robot (3.000000m).
[ERROR] [1709619596.036978217]: Cannot clear map because pose cannot be retrieved
[ WARN] [1709619596.040011241]: Unable to get starting pose of robot, so unable to create global plan
[ WARN] [1709619596.040297600]: move_base didn't receive global plan, mower's recoverying and will try again ......

[ WARN] [1709619596.045662906]: Clearing costmap to unstuck robot (0.000000m).
[ERROR] [1709619596.046150987]: Cannot clear map because pose cannot be retrieved
[ WARN] [1709619596.052540739]: Unable to get starting pose of robot, so unable to create global plan
[ WARN] [1709619596.052682668]: move_base didn't receive global plan, mower's recoverying and will try again ......

[ WARN] [1709619596.068955672]: All recovery behaviors have failed, locking the planner and disabling it.

[ WARN] [1709619596.069181666]: Your executeCallback did not set the goal to a terminal status.
This is a bug in your ActionServer implementation. Fix your code!
For now, the ActionServer will set this goal to aborted

这个问题与时间同步和TF有关,在本地打开Rviz和SSH运行远程程序时,有很小概率出现此错误。


状态机和脱困机制
本博客涉及公司内容,请输入密码
Read more
规划出现短路径导致move_base退出
abstract Welcome to my blog, enter password to read.
Read more
Autoware编译和运行

最好在ubuntu18.04编译,如果是20.04会出现PCL和OpenCV不兼容的问题。

同ROS的工作空间,在根目录执行 colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release 如果编译时提示缺失目录,可手动去创建。

如果CPU不够强,可能需要单独编译某些包,比如lidar_localizer: colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release --packages-select lidar_localizer

我编译的最终结果

1
2
Summary: 165 packages finished [7min 55s]
6 packages had stderr output: lanelet2_python op_simulation_package op_utilities trafficlight_recognizer way_planner waypoint_maker

ninja类似,colcon编译的结果在install目录里,运行的launch实际也是install目录的,修改源码的cpp和launch后,都需要编译。

运行时报错找不到 type JointStateController

world_test_citysim_b.launch中有一句

1
<rosparam file="$(find vehicle_model)/config/controller.yaml" command="load"/>`

打开yaml文件发现有这样的内容
1
2
joint_state_controller:
type: joint_state_controller/JointStateController

安装插件 sudo apt-get install -y ros-melodic-*-controllers

启动world_test_citysim_a.launch后,城市环境很快会出现,但是salon前面的白色车要过一会才能出现,这是我们要操作的车。


Linux关机时执行脚本

先写一个脚本file.sh放在/etc/rc.d/init.dchmod -f 777 file, 再ln -s file /etc/rc.d/rc0.d/K01file/etc/rc.d/rc6.d/K01file, 同时也要 ln -s file /etc/rc.d/rc3.d/S99fileln -s file /etc/rc.d/rc5.d/S99file

1
2
3
4
5
sudo ln -s shell_name /etc/rc0.d/K07shell_name
sudo ln -s shell_name /etc/rc3.d/S99shell_name
sudo ln -s shell_name /etc/rc5.d/S99shell_name
sudo ln -s shell_name /etc/rc6.d/K07shell_name
sudo touch /var/lock/subsys/shell_name

K开头的代表系统关闭的时候执行,S开头的代表开机的时候执行。注意服务器脚本编写的规范,因为有K开通的软链接并不一定会在关机的时候自动去执行,因为执行K脚本的时候会查询/var/lock/subsys/是否有与K开头脚本同名的空文件名,如果没有就不去执行,所以要按照服务器脚本编写的规范,启动的时候要在/var/lock/subsys/先touch一个与K01后面同名的空文件. 同时也要调用/etc/rc.d/init.d/functions能够接受star与stop命令信号,具体可以参考/etc/rc.d/rc文件

  • /etc/rc0.d 包含关机脚本K*
  • /etc/rc6.d 包含重启脚本K*

Linux关机过程:

在系统关机时,shutdown命令会告诉init程序转换到0运行级别,如果是重新启动则转换到第6运行级别。当进入0或者6运行级别后,系统将会依照下面的顺序运行:

  1. init关闭所有它能关闭的进程(转换到其他运行级别也一样)

  2. rc0.d/rc6.d目录下的第一个命令开始运行,锁定系统文件为关机作准备

  3. rc0.d/rc6.d目录下的第二个命令运行,卸载除根文件系统以外的所有文件系统(如挂载的windows分区)

  4. rc0.d/rc6.d中的命令将把根文件系统重新挂载为只读属性

  5. rc0.d/rc6.d中的命令调用sync程序把缓存中的数据写入文件系统

  6. 最后的命令是重新启动或者关闭内核程序

即使是手动断电,也会执行脚本。但是注意:如果是耗时的脚本命令,断电时间太快,来不及执行。但如果是shutdownreboot命令,是可以执行的


ROS编译和运行问题

运行时节点出现报错

1
2
user@user:~/nav$ rosrun test test_node
/opt/ros/noetic/bin/rosrun: line 150: /home/isward/isward_ii_ws/devel/lib/test/test_node: Success

roscore报错

安装完ROS后,运行roscore报错

1
Traceback (most recent call last): File "/opt/ros/groovy/lib/python2.7/dist-packages/roslaunch/__init__.py", line 230, in main write_pid_file(options.pid_fn, options.core, options.port) File "/opt/ros/groovy/lib/python2.7/dist-packages/roslaunch/__init__.py", line 106, in write_pid_file with open(pid_fn, "w") as f: IOError: [Errno 13] Permission denied: '/home/user/.ros/roscore-11311.pid'

注意rosdep update不要用sudo,解决方法:sudo rosdep fix-permissions,然后rosdep update

roscore报错 2

在一堆报错的最后是:

1
InvalidURL: URL can't contain control characters. '192.168.1.35 ' (found at least ' ')

看上去是IP最后多了个空格,改了MASTER_URI不起作用,最后发现问题是ROS_IPROS_HOME,不要再写成hostname -I,写成192.168.1.35

roscore报错 3

roscore启动失败
原因是因为ROS1支持的是Python2.7,而我之前将Ubuntu自带的Python升级到了3.8,也就是默认版本改变了,所以导致了相关的错误。 解决方法
改回来之后,有些程序可能还是需要python3.8,这时运行python3.8 file.py,可能会报错no module named rospkgno module named pydot,所以需要pip install rospkg pydot

rosbash出错

rosbash出错
source setup.bash出现的错误,发现我用的是zsh,换了之后正常

Network communication failed

执行ROS命令结果报错 Network communication failed

这可能是当前用户没有这个权限,切换到root再试试

rosdep init 报警

rosdep init
这是因为之前运行过了,可以无视这个问题

[rosrun] Couldn’t find executable named ……

rosrun出错


环境变量没设置好,应当是IP和端口组合

编译出错


对用到的某个文件没有给权限,找到执行chmod 755即可

rosnode list无结果

roscorerosrun可以正常运行,但是rosnode list只有rosoutrostopic list没有任何话题。无法用命令对节点和话题操作,可能是通信问题。原因不明,重启电脑后正常。

编译rviz出错


没有安装yaml-cpp或者版本不对,因为rviz读取PersistentSettings文件时需要这个库,下载0.5.1版本的yaml,把之前安装的yaml删掉,重新编译安装。编译生成的几个文件在/usr/local/lib/usr/local/include,如果找不全,就先用现在的版本再编译一遍就知道了

Lost sync with device, restarting


启动bringup.launch,偶尔会出现这个错误,这是下位机和ROS通讯失败造成的,报错在rosserial,很少发生

rqt报错


后来明白,原因是发布坐标系的节点在ubuntu 20.04,rqt在ubuntu 18

rqt_tf_tree运行失败

运行rqt_tf_tree,会出现 wait_for_service(/rqt_gui_py_node/tf2_frames), failed to contact ,will keep trying rqt_tf_tree一直打不开。

原因是IP变动的问题,重新设置IP地址,最好主机与从机都设置为静态ip。

Rviz启动失败

一次没操作好,导致Rviz报错

这个文件是存在的,和其他电脑上正常的Rviz对应的文件路径相同。一直查不到原因,在网上查来查去,反复查ogre这个库是不是装的出问题了,还以为是OpenGL版本变高了,其实这是显卡决定的。最后终于发现链接的librviz.so不是官方路径的,而是我自己工作空间的。可以先用官方的替换

执行任何ROS命令报一堆错

image.png
硬盘空间耗尽

编译报错

1
The build space at '/home/user/catkin_ws/build' was previously built by ''. Please remove the build space or pick a different build space.

原因是硬盘空间耗尽

Unable to communicate with master

执行任何ROS命令都会报错 Unable to communicate with master

已经运行了ROS节点,结果发现什么ROS命令都会报这个错,后来终于发现,启动节点的环境变量是一套,后来改了,ROS命令都报这个错误。注意ROS_IP如果使用名称而不是数字,必须是/etc/hosts里定义的。

Rviz一个关于时间戳的报错,不必理会 Detected jump back in time of 0.223145s. Clearing TF buffer.


ROS service的奇怪bug

客户端的报错:

1
ERROR: service [/task_dispatcher] responded with an error: b'Error opening file: '

服务端的报错:

1
Exception thrown while processing service call: Error opening file

其实是代码中有错误,而且这个错误不一定和ROS有关,其他问题也会出现这个报错。


保存的bag在断电重启后大小变为0
abstract Welcome to my blog, enter password to read.
Read more