Erlang Factory Lite Paris
Transcription
Erlang Factory Lite Paris
Erlang Factory Lite Paris Lessons learned How we use Erlang to analyze millions of messages per day ©Semiocast A few words about us What we do with Erlang ©Semiocast A few words about us S e m i o c a s t processes social media conversations to provide analytics and market research insights 3HJYPZLKL.\LYSHPUZ\YSL^LI[LTWZYtLS 6J[VIYL·(UHS`ZLKL:LTPVJHZ[ » (\KtWHY[\U WYVWVZYHJPZ[LKL17 .\LYSHPUSL]LUKYLKP VJ[VIYLn OZ\Y-YHUJL » 3LI\aaZLKtWSHJL WL\nWL\K\YLQL[KL S»OVTTLnSHJYP[PX\L KLSHTHYX\LIV`JV[[ THUPMLZ[H[PVUK\ VJ[VYIYL TVKPÄJH[PVUKLZ PU[LU[PVUZK»HJOH[ » 3HJYPZLLZ[ S»VJJHZPVUKLMHPYLSL SPLUH]LJSLNYV\WL UVTIYLZPNUPÄJH[PMKL JVVJJ\YYLUJLZ3=4/ L[.\LYSHPU » 3HJVU[LZ[H[PVUZ\Y -HJLIVVRZLTvSL H]LJSHKtYPZPVUL[ SHNLZ[PVUKLJYPZL Z\YJLTtKPHLZ[[YuZ JYP[PX\tL » 3»PU[LYUH[PVUHSPZH[PVU KLSHJYPZLYLZ[LSPTP[tL THPZS»PUMVYTH[PVUZL KPMM\ZLH\1HWVULU ÄUKLWtYPVKLS\UKP VJ[VIYL *OYVUVSVNPLKLZt]tULTLU[Z 5VTIYLKLTLZZHNLZWHYOL\YL =L :H +P 3\ 4H 4L 1L =L =LO 7YVWVZKL 17.\LYSHPU *VVJJ\YYLUJLZ :H +P 3\ =L O ,_J\ZLZKL 17.\LYSHPU 4H 4L 1L =L :H :HO 9tHJ[PVUKLSHZVJPt[t .\LYSHPUZ\Y-HJLIVVR +P 3\ 3\O iKP[VKL 7\S]HY 4H 4L 1L =L :H +P =L O *VTT\UPX\tKLSHZVJPt[t .\LYSHPUZ\Y-HJLIVVR 3\ 1V\Y VJ[VIYL :HO 4HUPMLZ[H[PVU H\_*OHTWZiS`ZtLZ ')700&0/2'#-4+.'7'$/'4702,3 ;OuTLZWHYWtYPVKLZJStZL[L_LTWSLZKLTLZZHNLZ 0»TSV]PUN[OL.\LYSHPU(SSLNVYPHZJLU[ :PUVU n TVU H]PZ .\LYSHPU LZ[ TPL\_ 1»HP IPLU HPTt 0K`SSL TvTL ZPWS\ZPL\YZSLKt[LZ[LL[4P[ZV\RV###:OHSPTHYTL]HH\ZZP 5PJLTHRL\WI`N\LYSHPU :V ^HU[ [V [Y` [OL UL^ .\LYSHPU 3PUNLYPL KL 7LH\ MV\UKH[PVU HM[LY ^H[JOPUN'=P]PHUUH4HRL\WYH]LHIV\[OV^SV]LS`P[^HZ! :V\UKZ SV]LS` 9; 'SLZSPLKHUPLSZ! :6;+ .\LYSHPU 0UZVSLUJL ,+; -Y\P[`-SVYHS^P[O-YLZO9LKILYYPLZ 3H¸JVTT\ULSSLKLYVZLI\SNHYL¹.\LYSHPUUVTKLJL[HTHSNHTL J»LZ[SL7L[Y\ZKLSHWHYM\TLYPLZV\YJL!S»,_WYLZZ:[`SLZ .\LYSHPU»Z (ILPSSL 9V`HSL >PSS P[ YLHSS` SPM[ ÄYT HUK ^YPURSL JVYYLJ[ T`ZRPU& ')PN-HZOPVUPZ[H0»SSNL[P[Q\Z[HZP[»ZKH`Z^VY[OMYLLMV\UKH[PVUI\[ ]LY`OHWW`^P[O.\LYSHPUSPUNLYPLKLWLH\P[PZHTHaPUN 'KLSNVMMHZ[\JPL\_SLTHX\PSSHNL.\LYSHPUKHUZ\ULIVP[LKLWLUUL VJ[VIYL 9HJPZTL (\[YLZ (]PZZ\YSLZWYVK\P[Z 7\IZWYVK\P[Z 5uNYL )V`JV[[ 4HX\PSSHNLZ 7HYM\TZ i]tULTLU[Z 3=4/ *VTTLU[HPYLZZ\YSPUMVYTH[PVU 9LWYPZLKLSPUMVYTH[PVU 7\IZWYVK\P[Z 1V\YVJ[ '9+)52'3 02'4*#/ .'33#)'3#$054')700&+/5)534 ')700&-'#&'2+/.+%20.'33#)'%0/6'23#4+0/3 )2074*+/.'33#)'3#$054')700& ')700& ')700&#$2#/&$'-06'& 20/, $#-#/%'&12'3'/%'7+4*+/2'#-4+.'7'$/#4+0/3 ')700&%-03'3440--+340/(#24*'344020/,#/&''4'2 --+340/ -#6023 #2-97+/' ''4'2 '33#)'3#$054')700&#/&04*'2$2#/&3&52+/)5)534 5.$'20(.'33#)'31'2	#--$2#/&3'8%'14-#6023 5.$'20(.'33#)'31'2	(02-#6023 VJ[VIYL 4HUPMLZ[H[PVU 5)534 < 5..#29 ,_J\ZLZ !7+44'2#%'$00,"'+$0-52,; 345&9$9 '.+0%#34 5VTIYLKLTLZZHNLZWHYQV\YTLU[PVUUHU[ SHTHYX\L.\LYSHPU L[SL[OuTL (WWLSH\IV`JV[[ +tYPZPVU *YP[PX\LKLSHTHYX\L.\LYSHPU 17.\LYSHPU 17.\LYSHPU0SU»HQTZMHP[X\LSX\LJOVZLK»H\ZZPUH\ZtHIVUK .\LYSHPUSLWHYM\TX\PW\LSLYHJPZTL 1LHU7H\S .\LYSHPU KL]YHP[ WYLUKYL \U IVU UuNYL WV\Y ZH JVTT\UPJH[PVUKLJYPZL *LZ[SHKLYUPLYLX\LQ»HJOL[L.\LYSHPU]H[LMHPYLMV\[YL 3LWHYM\TL\YX\P]P[LUJVYLH\[LTWZK\JVSVUPHSPZTLIL\YRRRRWS\Z QHTHPZQLU»HJOu[LYHPZ\UWYVK\P[.\LYSHPU 3LZ WHYM\TZ KL .\LYSHPU ZLU[LU[ IVU THPZ S»OtYP[PLY .\LYSHPU W\L ]YHPTLU[ -<*2TPZ[LYN\LYSHPUMYVT[OLIV[[VTVMT`OLHY[ 4LZZHNLH[V\ZSLZ¸UuNYLZ¹IV`JV[[LY[V\ZSLZWYVK\P[Z.<,93(05 3LZWYVWVZKL1LHU7H\S.\LYSHPUOtYP[PLYK\NYV\WLKLS\_LZ\YSL [YH]HPSKLZ¸UuNYLZ¹ZVU[PUQ\YPL\_ #90(.0/4*5)534 0U[LYUH[PVUHSPZH[PVUKLSHJYPZL +PZ[YPI\[PVUNtVNYHWOPX\LKLZHWWLSZH\IV`JV[[ ')700& VJ[VIYL *VTTLU[HPYLZZ\YSPUMVYTH[PVU 9LWYPZLKLSPUMVYTH[PVU 7\IZWYVK\P[Z -YHUJL 9< i< :\PZZL (WWLSH\IV`JV[[ +tYPZPVU *YP[PX\LKLSHTHYX\L.\LYSHPU 17.\LYSHPU *HUHKH 9LZ[L K\TVUKL .\LYSHPULUJVTT\UPJH[PVUKLJYPZLZ\Y-HJLIVVRPSt[HP[[LTWZ )V`JV[[LY.<,93(059+=Z\YSLZ*OHTWZ-HP[LZWHZZLY (3,9;,(<9(*0:4,!1»PU]P[L[V\[LZWLYZVUULZZLZLU[HU[JVUJLYUt WHY SL YHJPZTL JVUKLZJLUKHU[ KL 49 .<,93(05 H IV`JV[[t JL WHYMHP[ PKPV[ HPUZP [V\Z SLZ HY[PJSLZ KL *VZTt[PX\L MHIYPX\t WHY SH ZVJPt[t KL JL KLYUPLY :P [\ LZ UVPY L[ X\L [\ WVZZuKL \U WL\ KL ÄuY[t JVWPL JL Z[H[\[ KHUZ Z\Y [H WHNL MHJLIVVR L[ MHP[ [V\YULY SH ]PKtVX\PLZ[Z\YTVUWYVÄS :\Y-HJLIVVRJLZKLYUPLYZQV\YZVUWL\[VIZLY]LYSHJO\[LK\T\Y KL.\LYSHPU 1»HWWLSSL H\ IV`JV[[ [V[HS KLZ WYVK\P[Z .\LYSHPU X\P ¸W\LU[¹ SL YHJPZTLWYPTHPYL N\LYSHPUZHSLIH[HYKKLYHZ[VUPSMH\[IV`JV[[LYJLZWYVK\P[ZKLTLYKL L[SLZTLKPHX\PWHZZLJLZWYVWVZKLYHJPZ[LZV\ZZPSLUJL 3LIV`JV[[KL.\LYSHPUZ»VYNHUPZLZ\Y-HJLIVVR (KPL\¸0UZ[HU[4HNPX\L¹KL.\LYSHPUQLIV`JV[[L 20/, --+340/ #2-97+/' '/4+.'/40(.'33#)'3#$054')700& *#2'0(.'33#)'3 03+4+6'.'33#)'3 #/)5#)'30(.'33#)'3#$054')700& /)-+3* 0245)5'3' *+/'3' #1#/'3' ')#4+6'.'33#)'3 1#/+3* 2'/%* 4#-+#/ 54%* #90(.0/4*5)534 /&0/'3+#/ -#6023 ''4'2 ')700& :LTPVJHZ[ ©Semiocast 3HJYPZLKL.\LYSHPUZ\YSL^LI[LTWZYtLS ZLTPVJHZ[JVT '.+0%#34: ')700&0/2'#-4+.'7'$/'4702,35)534 34+.#4'&/'4702,3#6'2#)'!7+44'2#%'$00,"'+$0; 3'.+0%#34%0. Semiocast’s offer Quantitative studies ad hoc Qualitative studies ad hoc Monitoring Tools ©Semiocast Barometers Shares of social media conversations Sentiment analysis and clustering Topic identification Ad hoc quantitative indicators Consumer insights Verbatim research Clustering of conversations Mapping of communities and influence analysis Enumeration of social media conversation spaces Real-time alerts Daily/Weekly/Monthly reports Crisis monitoring Social media monitoring platform (Semioboard) Technology as a service (API) Semioboard ©Semiocast Semioboard ©Semiocast Live analysis of comments on TV debates ©Semiocast How we ended up using Erlang ©Semiocast How we ended up using Erlang Discovered Erlang when getting WiFi rabbits to talk to each other over XMPP (ejabberd) in 2007 Taught OCaml in 2004 Three reasons why we chose Erlang : - hot code change and inspection - fault-tolerance - happy to do functional programming (gave us a break from Java and C++) ©Semiocast How we use Erlang 1352 OTP releases - 47 applications {release, {"Semiocast OTP", "1352"}, {erts, "5.8.4"}, [ % erts 5.8.4 {kernel, "2.14.4"}, {stdlib, "1.17.4"}, {mnesia, "4.4.18"}, {inets, "5.5.2"}, {sasl, "2.1.9.3"}, {crypto, "2.0.2.1"}, {snmp, "4.19"}, {otp_mibs, "1.0.6"}, {ssl, "4.1.5"}, {public_key, "0.12"}, {xmerl, "1.2.8"}, {compiler, "4.7.3"}, {runtime_tools, "1.8.5"}, {syntax_tools, "1.6.7"}, - 100K lines of Erlang (without tests) % Other libs {erlsom, "1.2.1"}, {mochiweb, "0.167.10"}, {nitrogen, "2.0.20100531.14"}, {nprocreg, "0.1"}, {simple_bridge, "1.0.2"}, - 11K lines of C/C++ (mostly glue) % Semiocast {analyzer, "22", load}, {alien_models, "50", load}, {alien_uniform_streams, "7", load}, {api_server, "109", load}, {aspell, "4", load}, {binlog, "26", load}, {certificate_authority, "8", load}, {chasen, "11", load}, {chinese_segmenter, "1", load}, {commonlib, "250", permanent}, % always {ctl, "29", permanent}, % always {dashboard_engine, "55", load}, {dashboard_storage, "56", load}, {dashboard_website, "226", load}, {developer_website, "36", load}, {engine, "455", load}, {gate, "79", load}, {geodb, "5", load}, {geoip, "2", load}, {image_magick, "8", load}, {kdtree, "3", load}, {kqueue, "1", load}, {link_grammar_parser, "12", load}, {memcached, "5", load}, {mysql, "8", load}, {nagios, "6", permanent}, % always {opennlp, "8", load}, {pgsql, "2", load}, {pubsubhubbub, "4", load}, {qr_website, "1", load}, {re2, "1", load}, {s_http, "42", load}, {setproctitle, "2", permanent}, % always {sink, "15", load}, {sqlite, "9", load}, {storage, "226", load}, {svg, "12", load}, {svm, "1", load}, {text_ident, "27", load}, {text_proc, "62", load}, {titema_website, "7", load}, {url_server, "8", load}, {uuid, "6", load}, {web_common, "14", load}, {web_gate, "8", load}, {web_storage, "5", load}, {wikimedia, "6", load} - 1K lines of Java (glue) ~50 ungraduated applications for - prototyping - short lived projects - web-based/command line tools that run on dev machines ©Semiocast ]}. start commonlib. start ctl. start nagios. start setproctitle. A few things we wish we had known about Erlang A few things we wish we had known about Erlang ©Semiocast A few things we wish we had known about Erlang Mistake #1: Creating an erlang process to do a lot of work - processes should spend most of their time waiting for messages (gen_server), or do some intensive work and quickly exit when done (spawn_link) - when required, benchmark, as message passing with the worker process can prove expensive Self = self(), spawn_link(fun() -> Self ! {language, analyze_language(Text, MD0, Mode)} end), Self = self(), spawn_link(fun() -> -> Self Self !! {language, {location, analyze_language(Text, analyze_location(MD0, MD0, Mode)} end),end), spawn_link(fun() Mode)} {NProcessedLang, -> Language} = receive {language, RespLa} ->Mode)} RespLaend), end, spawn_link(fun() Self ! {location, analyze_location(MD0, {NProcessedLoc, Language} Location} == receive receive {language, {location, RespLa} RespLoc}-> ->RespLa RespLocend, end, {NProcessedLang, {NProcessedLoc, Location} = receive {location, RespLoc} -> RespLoc end, Faster {NProcessedLang, Language} = analyze_language(Text, MD0, Mode), {NProcessedLoc, Language} Location} == analyze_language(Text, analyze_location(MD0, Mode), {NProcessedLang, MD0, Mode), {NProcessedLoc, Location} = analyze_location(MD0, Mode), ©Semiocast A few things we wish we had known about Erlang Mistake #2: Creating a lot of processes for parallelized computing - having more worker processes than schedulers does not make sense - it can actually hurt, because processes waiting for a reply may not have it in time and will fail with a timeout ©Semiocast Thinking OTP Thinking OTP ©Semiocast Thinking OTP Mistake #3: Starting processes outside the supervision tree - gen_server & co. should be used everywhere, except for very short lived processes (that won’t be upgraded) - Every gen_server should be started from a supervisor - A real-world supervision design requires process_flag(trap_exit, true), monitor/2 and link/1, as well as some thinking ©Semiocast %%---------------------------------------------------------------%% @doc supervisor init callback. %% -spec(init/1::(any()) -> sup_init()). init(_Args) -> UserServerSupSpec = {user_server_sup, % id {user_server_sup, start_link, []}, % init function permanent, % restart children that crash ?SHUTDOWN_DELAY, supervisor, [user_server_sup] % modules }, AlienUserServerSupSpec = {alien_user_server_sup, % id {alien_user_server_sup, start_link, []}, % init function permanent, % restart children that crash ?SHUTDOWN_DELAY, supervisor, [alien_user_server_sup] % modules }, MailboxServerSupSpec = {mailbox_server_sup, % id {mailbox_server_sup, start_link, []}, % init function As a rule of thumb Modules should children be a list with permanent, % restart that crash one element [Module], where Module is the callback ?SHUTDOWN_DELAY, supervisor, modules module, if the [mailbox_server_sup] child process is a %supervisor, }, gen_server or gen_fsm UserManagerStartChildSemaphore = {user_manager_start_child_sem erl -man [{local, supervisor {semaphore, start_link, user_manager_start_child_ transient, % restart children that crash gen_manager is a ?SHUTDOWN_DELAY, worker, gen_server with a callback module [semaphore] % modules }, handling code_change messages UserManagerSpec = {user_manager, % id (here, user_manager) {user_manager, start_link, []}, % init function transient, % restart children that crash ?SHUTDOWN_DELAY, worker, [gen_manager, user_manager] % modules }, RestartStrategy = {one_for_one, ?MAX_RESTARTS, ?MAX_RESTARTS_P % Setup snmp probe for semaphore. gen_snmp_agent:set_variable(?SNMP_USER_MANAGER_START_CHILD_SEM ©Semiocast Thinking OTP Mistake #4: Thinking obscure comments in the documentation do not really apply - When in doubt, source code is handy, and helps figuring out when we really need to go off the rule Thinking OTP Mistake #5: Putting everything in a single virtual machine (node) per server - Virtual machines may crash - Code changes can fail and take down the whole node - It’s better to separate critical code - even per server if a crashing node can take a huge amount of RAM and make other nodes swap ©Semiocast Foreign code Interfacing with foreign code ©Semiocast Foreign code Six ways to interface foreign code with Erlang: - Linked-In drivers - External drivers - NIFs - os:cmd/1 - C-based distributed node - Java-based distributed node (jinterface) We tried them all… …and are looking forward to future extensions to the native interface (R15?) ©Semiocast Foreign code Method Linked-in drivers External drivers NIFs os:cmd/1 C-based distributed node jinterface ©Semiocast We use/used for - aspell - kqueue (FreeBSD/MacOS X kqueue binding) - SQLite - ImageMagick - GeoIP - WebKit - uuid - re2 (linear time bound replacement for re) - bzip2 - OpenSSL - Batik (svg rasterizer in Java) - ruby (we actually bound Rails websites with Erlang at some point) - OpenNLP Foreign code Mistake #6: Using linked-in drivers for open source code that could crash/abort E.g.: ImageMagick will abort on bad input - External drivers are more suitable when external library is large, crash-prone or could leak (sometimes, the leak is in the glue…) - Passing pointers is possible but requires some logic, typically binding a pointer to the port ©Semiocast Foreign code Mistake #7: Using linked-in drivers for I/O intensive code E.g.: sqlite - Linked-in driver code is executed within a scheduler thread. Running for too long will starve other processes that will timeout, waiting for messages - Theoretically, we can use async threads (and we do with sqlite). However, enabling async threads (+A) has a huge impact on built-in I/O drivers - Performance could be worse with external drivers ©Semiocast Erlang technologies we love hate Erlang technologies we love ©Semiocast Erlang technologies we love hate dialyzer - part of our compilation cycle - found many bugs, typically inconsistencies between callers and callees - starting with -spec when defining exported funs We wish it would be fixed/improved: - horribly slow - sometimes blind - hard to understand - fails on code_change code - useless warnings that cannot be disabled ©Semiocast Erlang technologies we love hate snmp - makes it really easy to integrate erlang nodes within a monitoring solution (we use nagios and munin) HiPE OTP installed with --enable-native-lib and all our code compiled with +native - helps with CPU-bound work (including dialyzer…) - most patches we submitted were HiPE-related We are also grateful to the authors of: erlsom, mochiweb, nitrogen, zotonic… ©Semiocast Thank you ! [email protected] ©Semiocast