GNU Radio - Bug #778

Transcription

GNU Radio - Bug #778
GNU Radio - Bug #778
Segfault using gr-mac and pmts. Race condition?
03/19/2015 05:03 pm - Moritz Fischer
Status:
New
Start date:
Priority:
Normal
Due date:
Assignee:
Moritz Fischer
% Done:
0%
Estimated time:
0.00 hour
Category:
03/19/2015
Target version:
Resolution:
Description
Running the attached script with GNU Radio from maint, segfaults after some time. I did not attach the core as it's 76M in size.
This was run on 1c33a22e92654bb8333fcd0360414a514e6927f8 (maint) on a oe dizzy based Xilinx Zynq system.
Steps to reproduce:
1) Build off of maint / or master
2) python pmt_smasher.py
3) Wait ... segfault
This GDB was configured as "arm-oe-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/&amp;gt;.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/&amp;gt;.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/bin/.debug/python2.7...done.
done.
(gdb) set args pmt_smasher.py
(gdb) core-file core
warning: core file may not match specified executable file.
[New LWP 1014]
[New LWP 1011]
[New LWP 1009]
[New LWP 1013]
[New LWP 1012]
[New LWP 1010]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `python ./pmt_smasher.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000001c in ?? ()
(gdb) bt
#0 0x0000001c in ?? ()
#1 0xb5438768 in ~intrusive_ptr (this=0xb2900748, __in_chrg=<optimized out>)
at
/home/mfischer/local/oecore-x86_64-pmt-dizzy/sysroots/armv7ahf-vfp-neon-oe-linux-gnueabi/usr/include/boost/smart_ptr/intrusive_pt
ptr.hpp:97
#2 _wrap_delete_swig_int_ptr (args=<optimized out>)
at /home/mfischer/src/gnuradio.git/build-pmt-dizzy/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:30147
10/14/2016
1/6
#3 0xb6e55a60 in PyObject_Call (func=func@entry=0xd41e18,
arg=arg@entry=0x14bee70, kw=kw@entry=0x0) at Objects/abstract.c:2529
#4 0xb6e5602c in PyObject_CallFunctionObjArgs (callable=0xd41e18)
at Objects/abstract.c:2760
#5 0xb542f9b4 in SwigPyObject_dealloc (v=0x1b2c0b0)
at /home/mfischer/src/gnuradio.git/build-pmt-dizzy/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:1676
#6 0xb6e93130 in dict_dealloc (mp=0x1b2bed0) at Objects/dictobject.c:985
#7 0xb6eb0a20 in subtype_dealloc (self=0x1b25e70) at Objects/typeobject.c:999
#8 0xb6ef9710 in call_function (oparg=<optimized out>, pp_stack=0xb17fe700)
at Python/ceval.c:4055
#9 PyEval_EvalFrameEx (f=f@entry=0xb2900bd8, throwflag=throwflag@entry=0)
at Python/ceval.c:2666
#10 0xb6efbde8 in fast_function (nk=<optimized out>, na=<optimized out>, n=2,
pp_stack=0xb17fe7c8, func=<optimized out>) at Python/ceval.c:4107
#11 call_function (oparg=<optimized out>, pp_stack=0xb17fe7c8)
---Type <return> to continue, or q <return> to quit--at Python/ceval.c:4042
#12 PyEval_EvalFrameEx (f=f@entry=0x0, throwflag=<optimized out>)
at Python/ceval.c:2666
#13 0xb6efc9b8 in PyEval_EvalCodeEx (co=0x108b2a8, globals=<optimized out>,
locals=locals@entry=0x0, args=args@entry=0x1b2a384, argcount=2,
kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0,
defcount=defcount@entry=0, closure=closure@entry=0x0)
at Python/ceval.c:3253
#14 0xb6e7fea8 in function_call (func=0x108edf0, arg=0x1b2a378, kw=0x0)
at Objects/funcobject.c:526
#15 0xb6e55a60 in PyObject_Call (func=func@entry=0x108edf0,
arg=arg@entry=0x1b2a378, kw=kw@entry=0x0) at Objects/abstract.c:2529
#16 0xb6e682d8 in instancemethod_call (func=0x108edf0, arg=0x1b2a378, kw=0x0)
at Objects/classobject.c:2578
#17 0xb6e55a60 in PyObject_Call (func=func@entry=0x18c6968,
arg=arg@entry=0x14beef0, kw=kw@entry=0x0) at Objects/abstract.c:2529
#18 0xb6e55f7c in PyObject_CallMethodObjArgs (callable=0x18c6968,
name=0x1b25dc0) at Objects/abstract.c:2738
#19 0xb55bc884 in SwigDirector_feval_p::eval (this=0x1b5f728, x=...)
at /home/mfischer/src/gnuradio.git/build-pmt-dizzy/gnuradio-runtime/swig/runtime_swigPYTHON_wrap.cxx:7074
#20 0xb55ef490 in gr::py_feval_p::calleval (this=0x1b5f728, x=...)
at /home/mfischer/src/gnuradio.git/gnuradio-runtime/include/gnuradio/py_feval.h:88
#21 0xb677b398 in gr::block_gateway::dispatch_msg (this=0x1b5e7c8,
which_port=..., msg=...)
at /home/mfischer/src/gnuradio.git/gnuradio-runtime/include/gnuradio/block_gateway.h:294
#22 0xb67b7374 in gr::tpb_thread_body::tpb_thread_body (this=0xb17fecf8,
block=..., max_noutput_items=<optimized out>)
at /home/mfischer/src/gnuradio.git/gnuradio-runtime/lib/tpb_thread_body.cc:106
#23 0xb67adb98 in operator() (this=0x1948460)
at /home/mfischer/src/gnuradio.git/gnuradio-runtime/lib/scheduler_tpb.cc:44
#24 gr::thread::thread_body_wrapper<gr::tpb_container>::operator() (
this=0x1948460)
at /home/mfischer/src/gnuradio.git/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
#25 0xb6760970 in operator() (this=<optimized out>)
at
/home/mfischer/local/oecore-x86_64-pmt-dizzy/sysroots/armv7ahf-vfp-neon-oe-linux-gnueabi/usr/include/boost/function/function_templ
10/14/2016
2/6
plate.hpp:767
#26 boost::detail::thread_data<boost::function0<void> >::run (
this=<optimized out>)
at
/home/mfischer/local/oecore-x86_64-pmt-dizzy/sysroots/armv7ahf-vfp-neon-oe-linux-gnueabi/usr/include/boost/thread/detail/thread.hp
hpp:115
#27 0xb655bf68 in boost::(anonymous namespace)::thread_proxy (
#28 0xb6cd0db0 in start_thread (arg=0xb17ff460) at pthread_create.c:315
#29 0xb6db9080 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89
from /lib/libc.so.6
History
#1 - 03/21/2015 03:31 pm - Marcus Müller
- Can you add a py-bt from GDB, http://gnuradio.org/redmine/projects/gnuradio/wiki/TutorialsGDB ?
- What is after some time in minutes? I've been running the script for an afternoon now, and aside from a bit of warm air, I can't say it has had many
effects.
- Could you info threads and do a bt on each of them (ok, each of them looking interesting, but whatevs :) ); we both think it's something racey/thready,
so that might help a bit.
#2 - 03/22/2015 07:43 am - Marcus Müller
The core size increased to 200MB after a few hours. So no matter how we look at it, we probably have at least a memory leak.
#3 - 03/22/2015 10:05 am - Marcus Müller
- File valgrind-python.supp added
Good news: we're not dooooomed. And I do think that a python backtrace will help.
Bad news: I think it's python, and I don't even know if SWIG is to blame, and whether we can solve it. I think it's a bit of a bug in how GR gateways things.
I've run valgrind/memcheck on python2 pmt_smasher.py, which obviously lead to an enormous log when not suppressing all memcheck warnings that
come from PyObject_Malloc. That is a known issue -- python uses their own small objects allocator, and that doesn't play well with valgrind's detection
algorithms.
valgrind --leak-check=yes --suppressions=valgrind-python.supp --log-file=pmt_smasher.valgrind.log /usr/bin/python2 pmt_smasher.py
However, when suppressing all the python warnings, basically the wholeness of the continued memory usage increase of the process is ignored.
Conclusion: Memory does get leaked, and it's happening in python.
Now, I can't reproduce the behaviour you're seeing, but my wild speculation at this point is:
- python's garbage collection works as it damn well wants to -- not really well at all. That's why your core dump gets a bit bigger every second.
- PySwig objects, most probably things like PMTs etc, which are regularly passing the C++/python frontier from gr-mac.packet_framer, at times are
deallocated by python, and at times are not.
My blind guess is that when getting PMT objects from PMT objects, you basically get two PyObjects whose deallocators destroy the same object. That
might interfere whenever python garbage collection hits your object before it's destructed by C++ land.
10/14/2016
3/6
Having had a look at packet_framer:80f
80
def packetise(self, msg):
81
data = pmt.cdr(msg)
82
meta = pmt.car(msg)
83
if not pmt.is_u8vector(data):
84
#raise NameError("Data is no u8 vector")
85
return "Message data is not u8vector"
86
87
buf = pmt.u8vector_elements(data)
88
buf_str = "".join(map(chr, buf))
89
90
# FIXME: Max: 4096-header-crc
91
92
meta_dict = pmt.to_python(meta)
93
if not (type(meta_dict) is dict):
94
meta_dict = {}
95
96
pkt = ""
97
pkt += self.preamble
98
pkt += packet_utils.make_packet(
99
buf_str,
100
0,#self._samples_per_symbol,
101
0,#self._bits_per_symbol,
102
#preamble=<default>
103
access_code=self.access_code,
104
pad_for_usrp=False,#pad_for_usrp,
105
whitener_offset=self.whitener_offset,
106
whitening=self.whiten
107
)
108
pkt += self.postamble
109
pkt = map(ord, list(pkt))
110
if self.rotate_whitener_offset:
111
self.whitener_offset = (self.whitener_offset + 1) % 16
112
meta = pmt.to_pmt(meta_dict)
113
data = pmt.init_u8vector(len(pkt), pkt)
114
self.message_port_pub(pmt.intern('out'), pmt.cons(meta, data)
Have a look at data: what happens in l. 80 is that python calls a SWIG wrapper function that gives it a PyObject*, pointing to a PyObject that SWIG created
around the C++ object; python then sets the reference count of the PyObject to 1.
Skip to l. 113: data now points to another object, so the original PyObject's refcount is decreased to 0, and should thus be marked for garbage collection.
Problem is: two PyObject with distinct ref counters pointing to the same PMT object:
>>> import pmt
>>> car_in = pmt.intern("symbol")
>>> cdr_in = pmt.make_u8vector(16, 1)
>>> pair = pmt.cons(car_in, cdr_in)
>>> car_out = pmt.car(pair)
>>> cdr_out = pmt.cdr(pair)
>>> id(car_out)
10/14/2016
4/6
140353492053264
>>> id(car_in)
140353491209232
>>> car_out2 = pmt.car(pair)
>>> id(car_out2) == id(car_out)
False
uh.oh. car_out and car_out2 should really point to the same object, especially since it's a pmt_intern, which are singletons for each value.
#4 - 03/22/2015 06:23 pm - Doug Geiger
Which version of Python is this happening with? I have noticed some differences in the python garbage collection with Python >= 2.7.8 in Debian jessie
(compared with e.g. 2.7.3 that's in Ubuntu 12.04). Since this looks like an OE-based build, I'm guessing it's the older version of Python that's in play, yes?
#5 - 03/22/2015 06:51 pm - Marcus Müller
I missed an obvious thing:
#1 0xb5438768 in ~intrusive_ptr (this=0xb2900748, __in_chrg=<optimized out>)
at /home/mfischer/local/oecore-x86_64-pmt-dizzy/sysroots/armv7ahf-vfp-neon-oe-linux-gnueabi/usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
~intrusive_ptr really should just call our own releaser, compare boost's intrusive_ptr.hpp:
~intrusive_ptr()
{
if( px != 0 ) intrusive_ptr_release( px );
}
But we end up in an address #0 0x0000001c, which is WUT?!.
So, this might be a mixup somewhere in out own intrusive ptr wrapping; Moritz, could you share your SWIG and your Boost version? Also, as stupid as
that seems, /home/mfischer/src/gnuradio.git/build-pmt-dizzy/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:30147 and surroundings might come in
handy.
Doug: If I'm not mistaken, that's python 2.7.3 running on the e310, Moritz, confirm?
#6 - 03/31/2015 12:06 am - Moritz Fischer
It's Python 2.7.3, swig-3.0.2-r0, boost-1.56
#7 - 03/31/2015 12:07 am - Moritz Fischer
(gdb) list
30142
SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "delete_swig_int_ptr" "', argument " "1"" of type '" "boost::intrusive_ptr<
pmt::pmt_base > *""'");
30143
}
10/14/2016
5/6
30144
arg1 = reinterpret_cast< boost::intrusive_ptr< pmt::pmt_base > * >(argp1);
30145
{
30146
try {
30147
delete arg1;
30148
}
30149
catch(std::exception &e) {
30150
30151
SWIG_exception(SWIG_RuntimeError, e.what());
}
(gdb)
#8 - 03/31/2015 12:39 am - Moritz Fischer
So far could not reproduce on Wandboard with Python 2.7.9. Building images to reproduce on E310.
#9 - 03/31/2015 01:10 pm - Marcus Müller
If that doesn't show improvement, we might actually have to take the path of the code instrumenter:
adding -fsanitize=thread will add valgrind-y memory access instrumentation to the code, to detect race conditions.
Files
pmt_smasher.py
2.73 KB
03/19/2015
Moritz Fischer
valgrind-python.supp
6.82 KB
03/22/2015
Marcus Müller
10/14/2016
6/6