Added multi-threaded scan matching using OpenMP#14
Conversation
| <package> | ||
| <name>openslam_gmapping</name> | ||
| <version>0.1.2</version> | ||
| <version>0.1.3</version> |
| catkin_package( | ||
| INCLUDE_DIRS include | ||
| LIBRARIES gridfastslam scanmatcher sensor_base sensor_range sensor_odometry utils | ||
| DEPENDS OpenMP |
| if(OPENMP_FOUND) | ||
| message(STATUS "OPENMP FOUND") | ||
| set(OpenMP_FLAGS ${OpenMP_CXX_FLAGS}) # or if you use C: ${OpenMP_C_FLAGS} | ||
| set(OpenMP_LIBS gomp) |
There was a problem hiding this comment.
not necessary, please have a look at https://cmake.org/cmake/help/v3.0/module/FindOpenMP.html and http://stackoverflow.com/questions/17633513/cmake-cannot-find-openmp
| ) | ||
| target_link_libraries(gridfastslam scanmatcher sensor_range) | ||
|
|
||
| target_link_libraries(gridfastslam scanmatcher sensor_range ${OpenMP_LIBS}) |
| /**minimum score for considering the outcome of the scanmatching good*/ | ||
| PARAM_SET_GET(double, minimumScore, protected, public, public); | ||
|
|
||
|
|
| #include <gmapping/scanmatcher/scanmatcher.h> | ||
| #include "motionmodel.h" | ||
|
|
||
| #include <omp.h> |
There was a problem hiding this comment.
That's necessary because I'm calling theomp_get_wtime() function to measure execution time.
| //should be called only inside the processScan method | ||
| private: | ||
|
|
||
| std::vector<double> durations; |
There was a problem hiding this comment.
why is that a member and not created before the for loop in the scanMatch function ?
There was a problem hiding this comment.
because I want to store the execution time of each function call. so it has to be valid outside the function's scope as well.
There was a problem hiding this comment.
Why not just use two members such as sumscore and update_count to compute the average execution time, since the execution time of each function call is of little use?
|
Changed the things you mentioned. |
|
If I use an openmp function, I do have to link against it of course. |
|
I realized that I also need to keep the set(CXX_FLAGS) because no optimization is done otherwise. |
|
hi, i found using openmp would result in core dump such as "corrupted double-linked list" more often, did you ever met that? |
|
No, I never had that issue. Could you run valgrind on it and post the result? |
|
RESAMPLE** all processes on machine have died, roslaunch will exit |
|
Any follow ups? |
|
Hey everyone, |
|
|
||
| //set up the selective copy of the active area | ||
| //by detaching the areas that will be updated | ||
| m_matcher.invalidateActiveArea(); |
There was a problem hiding this comment.
The problem with these two lines:
m_matcher.invalidateActiveArea();
m_matcher.computeActiveArea(m_particles[i].map, m_particles[i].pose, plainReading)
They change map and matcher and they are designed to work sequentially. I suppose we should split this loop in two,
something like
//parallel
#pragma omp parallel for reduction(+:sumScore)
for (int i = 0; i < m_particles.size(); i++){
OrientedPoint corrected;
double score, l, s;
score=m_matcher.optimize(corrected, m_particles[i].map, m_particles[i].pose, plainReading);
// it->pose=corrected;
if (score>m_minimumScore){
m_particles[i].pose=corrected;
// m_infoStream << "Pose was corrected!" << std::endl;
} else {
if (m_infoStream){
m_infoStream << "Scan Matching Failed, using odometry. Likelihood=" << l <<std::endl;
m_infoStream << "lp:" << m_lastPartPose.x << " " << m_lastPartPose.y << " "<< m_lastPartPose.theta <<std::endl;
m_infoStream << "op:" << m_odoPose.x << " " << m_odoPose.y << " "<< m_odoPose.theta <<std::endl;
}
}
m_matcher.likelihoodAndScore(s, l, m_particles[i].map, m_particles[i].pose, plainReading);
sumScore+=score;
m_particles[i].weight+=l;
m_particles[i].weightSum+=l;
m_matcher.invalidateActiveArea();
m_matcher.computeActiveArea(m_particles[i].map, m_particles[i].pose, plainReading);
}
//sequential
for (int i = 0; i < m_particles.size(); i++){
//set up the selective copy of the active area
//by detaching the areas that will be updated
m_matcher.invalidateActiveArea();
m_matcher.computeActiveArea(m_particles[i].map, m_particles[i].pose, plainReading);
}
Also, m_infoStream << ... will be garbled if two or more threads don't get enough score. I suggest either omp critical or plain removal of these debug lines.
EDIT:
Alternatively, in scanmatecher.cpp in registerScan you can place real allocation of memory inside a critical section.
#pragma omp critical(alloc_area)
{
map.storage().allocActiveArea();
}
I profiled the gmapping process and found that the scan matching particle filter is very time consuming.
It easily forces even more modern CPU cores to 100% load on larger particle numbers utilizing only one core.
With the few changes I made, the scan matching process is now multi-threaded using the OpenMP library.
This results in a much better performance of gmapping in total.
I tested the package on three different platforms;