YaCy_CampusParty_20120825 print.key
Transcription
YaCy_CampusParty_20120825 print.key
Uncensorable, Untraceable Search Engines for Freedom of Information Michael Christen, [email protected] Campus Party 2012, Berlin Abstract SearchEngine Search portals in the web are vital decision tools for knowledge and cultural values of people. Free content should be accessible with free search. Instead of going through a centralized server that acts as a gatekeeper, keeps logs of your searches and directs you to selected information, your own self-made search engine can deliver information with no censorship, and no tracking. In this talk, search use-cases like a project search, file search (with attached downloader), faceted search with user-defined categories, social search and peer-to-peer search are explained and demonstrated. You will be familiarized with search engine technology in general and different software modules which can be used to create amazing search portals with unusual but useful functions in just some minutes. Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Human Rights Knowledge is free Access must be free for everyone Privacy is a human right Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Human Rights statement from United Nations UNO World Summit 2003 on the Information Society: CHARTER OF CIVIL RIGHTS FOR A SUSTAINABLE KNOWLEDGE SOCIETY (a) Knowledge is the heritage and the property of humanity and is thus free. (b) Access to knowledge must be free. (c) Everyone has an unlimited right of access to the documents of public and publicily controlled bodies. (d) The right to privacy is a human right and is essential for free and self-determined human development in the knowledge society. from: http://www.worldsummit2003.de/en/web/375.htm Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Centralized Search Portals can trace your behaviour danger of censoring, blocking, spamming they own your data Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Access to Information bridge between data and user free information can only be truly free if it can be accessed with free search free Data Search User u.a.: as it is today: free Software proprietary & centralized, it traces you and data can be censored, blocked, removed, spammed User needs proprietary and centralized software to discover free content Data unter Creative Commons License Open Access Archive Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Access to Information bridge between data and user free information can only be truly free if it can be accessed with free search free Data Ranking Search Ordering User Relevancy Community In a specific community people share the same relevancy criteria. Ranking influences standards and opinions within a community! Centralized Search Engines have a cultural impact on communities! Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Your Own Search Engine Independence ...from Centralized Search Portals: collect your own search index and search in a special way as needed for the content. Privacy ...you are the search engine operator: nobody can trace you! Freedom ...of Information: no data access limits, no censoring, no filtering, no user observation, no content spamming, your ranking Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Requirements for a „homebrew“ search engine Search Technology Software Modules Examples Easy Knowledge Learn how the search Available The software must be free. for use cases and possibilities. engine components work. Demo A ,Hello World‘ - search engine is a good startpoint to hack. Everyone must be able to install and operate the software Hackable APIs and transparency. Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Examples for use cases and possibilities. your own search portal search for files data protection & sanctuaries projects +communities (ftp/smb) persecuted content share knowledge ...with downloader? topic-oriented (news-) feeds distributed search social search federated search share share your intelligence service your search index your search experience Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ torrents etc. Michael Christen [email protected], http://yacy.net Knowledge how search engine components work search server web interface I crawler api search index opensearch gsa robots balancer queues schema facets network interfaces ranking moderation file http ftp smb oai-pmh doc parser document cache pdf xls html rss zip eml Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ solr monitoring I/O requests Disk/RAM administration/ steering Michael Christen [email protected], http://yacy.net Knowledge how search engine components work search server crawler api search index monitoring network interfaces document cache parser Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ administration/ steering Michael Christen [email protected], http://yacy.net Knowledge how search engine components work Easy 3-minute installation just decompress and start Available all parts are free software http://yacy.net http://lucene.apache.org/solr/ Hackable lots of APIs, many standards Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Knowledge how search engine components work Demo: • • • • • • curl -OL „http://archive.apache.org/dist/lucene/solr/3.6.1/apache-solr-3.6.1.tgz“ tar xfz apache-solr-3.6.1.tgz cd apache-solr-3.6.1/example/ java -jar start.jar open http://localhost:8983/solr/admin/ curl 'http://localhost:8983/solr/update/json?commit=true' -H 'Content-type:application/json' -d '{"add":{"doc":{"id":"data1", "title":"Hello World"}}}' • curl 'http://localhost:8983/solr/update/json?commit=true' --databinary @exampledocs/books.json -H 'Content-type:application/json' • curl 'http://localhost:8983/solr/select/?q=*%3A*' Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Knowledge how search engine components work SearchEngine Demo: • • • • • • • • curl -OL „http://yacy.net/release/yacy_v1.04_20120709_9000.tar.gz“ tar xfz yacy_v1.04_20120709_9000.tar.gz cd yacy ./startYACY.sh open http://localhost:8090 solr search interface is at http://localhost:8090/solr/select?q=*:*&start=0&rows=10 start a web crawl at http://localhost:8090/CrawlStartSite_p.html Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net your own search portal projects +communities share knowledge Demo: • Make • SearchEngine a federated search portal for: gnu.org, fsfe.org, campus-party.eu Add a FTP video archive from ftp://dewy.fem.tu-ilmenau.de/CCC/ search engine Create and Share Project Steering Discussion Produce Documents Version Control (micro)Blogging Bugtracker Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net search for files (ftp/smb) ...with downloader? Demo: • Choose • • SearchEngine „File Search“ or http://localhost:8090/yacyinteractive.html After searching, click „create a download script“ copy-paste the result to your terminal Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net data protection & sanctuaries persecuted content torrents etc. Demo: • Do an • SearchEngine indexing of thepiratebay using the sitemap provided by their robots.txt Use http://localhost:8090/CrawlStartSite_p.html and check the ,Sitemap URL‘ option. Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net topic-oriented (news-) feeds federated search your intelligence service Demo: • Feed • • • • YaCy with rss feeds at SearchEngine http://localhost:8090/Load_RSS_p.html Activate the scheduler to do this frequently Do a web search and add /date to the query to order by date change the page to rss format by replacing the html extension of the result page with rss read the search result page with your rss reader Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net distributed search share your search index YaCy has an integrated Peer-toPeer protocol to connect to other YaCy users. But how can this scale? How are peer connected? Peer-to-Peer Shared Search Index Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine distributed search A Search Engine Cluster consist of independent search engines in the form of a search matrix. share vertical scaling: more performance your search index Search Engine Cluster Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine horizontal scaling: more documents Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine distributed search We want to take the search matrix out of the data center to your home. share your search index Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Search Engine Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine distributed search The distributed search matrix in your home is connected using a peer-to-peer protocol. share your search index Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Search Engine Peer Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine The YaCy Search Engine Cluster consist of independent search engines, but they are connected in an efficient way using a distributed hash table. distributed search share your search index Peer Peer Peer Peer Crawl the web, create a web index, distribute the index Peer Peer Peer DHT Peer SearchEngine Peer Distributed Hash Table Peer Search in a Distributed Hash Table Peer Peer DHT-Store Peer Peer Peer Peer Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ DHT-Read Michael Christen [email protected], http://yacy.net distributed search Everyone can join the network. Nobody can censor the search index. SearchEngine share your search index Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net social search Peer-to-Peer share Shared Search Experience your search experience Peer-to-Peer Shared Search Index Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine Knowledge how search engine components work Demo: • read http://seeks-project.info/wiki/index.php/Download#Download • or just build seeks yourself: > > > > > > git clone git://seeks.git.sourceforge.net/gitroot/seeks/seeks cd seeks ./autogen.sh ./configure LDFLAGS="-Wl,--no-as-needed" --disable-opencv make cd src && ./seeks • attach YaCy: use opensearch interface from http://localhost:8090/yacysearch.rss?query=%query in seeks/src/plugins/websearch/websearch-config add the line • search-engine opensearch_rss http://localhost:8090/yacysearch.rss?query=%query yacy default • set seeks as your web proxy at port 8250 • open your browser at http://s.s/websearch-hp Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net APIs in Search Interface - Opensearch, SRU SearchEngine SRU Facets File Types, Protocols, Domains, Authors user-generated ontologies every link is verified before it is displayed: the content is loaded, parsed and used for a search snippet generation Standards APIs Opensearch (search results with RSS), JSON, AJAX tools Tools search widget, ready-to-use code snippets to embed search everywhere Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net APIs in Search Interface - Opensearch SearchEngine > curl http://localhost:8080/yacysearch.rss?query=foaf&maximumRecords=10 <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type='text/xsl' href='/yacysearch.xsl' version='1.0'?> <rss version="2.0" xmlns:yacy="http://www.yacy.net/" How to get Opensearch/JSON xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" Search Results: <!-- very short example --> <item> • do a normal web search in YaCy <title>Friend of a Friend (FOAF) project</title> • replace the ‘html‘ extension of <link>http://www.foaf-project.org/</link> the result page URL with ‘rss‘ <pubDate>Fri, 23 May 2008 02:00:00 +0200</pubDate> • for json, replace the ‘html‘ </item> extension with ‘json‘ <item> <title>FOAF - Wikipedia</title> <link>http://de.wikipedia.org/wiki/FOAF</link> <pubDate>Tue, 08 Jan 2008 01:00:00 +0100</pubDate> </item> <item> <link>http://microformats.org/wiki/xfn-to-foaf</link> <pubDate>Fri, 09 May 2008 02:00:00 +0200</pubDate> </item> </rss> http://www.opensearch.org Opensearch Standard: SRU Standard for Queries: http://www.loc.gov/standards/sru/specs/search-retrieve.html Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Search Interface Integration Code Snippet Example #1: a search window in an iframeSearchEngine How to integrate a YaCy Search Portal: Just copy-paste the code snippet to your web page source code. Code Snippet #2 looks like: The YaCy administration interface offers more code snippets. An example from /ConfigSearchBox.html looks like: <iframe name="target2" src="http://141.52.175.43:8080/yacysearch.html? display=2&resource=local" width="100%" height="180" frameborder="0" scrolling="auto" id="target2" </iframe> Code Snippet Example #2: a search box (points to new page) <form method="get" accept-charset="UTF-8" action="http://141.52.175.43:8080/yacysearch.html"> <div> <div>MySearch</div> <input type="text" name="query" value="" maxlength="80" /> <input type="hidden" name="verify" value="true" /> <input type="hidden" name="maximumRecords" value="10" /> <input type="hidden" name="meanCount" value="5" /> <input type="hidden" name="resource" value="local" /> <input type="hidden" name="urlmaskfilter" value=".*" /> <input type="hidden" name="prefermaskfilter" value="" /> <input type="hidden" name="display" value="2" /> <input type="hidden" name="nav" value="all" /> <input type="submit" name="Enter" value="Search" /> </div> </form> your YaCy peer provides help pages with code snippets for an easy integration! Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net APIs in Harvesting: Dublin Core Dump Import SearchEngine Standards: <?xml version="1.0" encoding="utf-8"?> <!-- YaCy surrogate using dublin core notion --> <surrogates xmlns:dc="http://purl.org/dc/elements/1.1/"> YaCy can import standard Dublin Core Metadata XML files as input for indexing <record> <dc:title><![CDATA[Alan Smithee]]></dc:title> <dc:identifier>http://de.wikipedia.org/wiki/Alan_Smithee</dc:identifier> <dc:description> <![CDATA['''Alan Smithee''' ist ein Anagramm von „The Alias Men“.]]> </dc:description> <dc:language>de</dc:language> <dc:date>2009-04-14T00:00:00Z</dc:date> <!-- date is in ISO 8601 --> </record> </surrogates> How to import Dublin Core Files: just place the xml files into a hand-over directory at DATA/SURROGATES/in/ The Dublin Core XML File Standard: http://dublincore.org/documents/dc-xml-guidelines/ Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net Summary 1. Access to knowledge and the right to privacy is a human right. Communites need their own ranking. Centralized search engines are not sufficient to provide this right to everyone. We need decentralized systems. 2. We demonstrated search use cases that are unmatched with current search portal providers Free content need more appropriate search technology for such content. 3. We explained how search technology works in general This was just the icetip. There is a lot more to know. 4. We demonstrated search tools which are easy, available and hackable: Solr, YaCy and Seeks For each tool you find a short tutorial inside this slides. 5. Please support the idea of free search and the projects Please help, test the software, ask questions, tell other people and help hacking! Uncensorable, Untraceable Search Engines for Freedom of Information Talk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/ Michael Christen [email protected], http://yacy.net SearchEngine Thank You for Listening SearchEngine QR-Code: vCard Dipl. Inf. Michael Christen, [email protected] http://yacy.net Download http://yacy.net http://latest.yacy.net Documentation http://wiki.yacy.net http://yacy-kochbuch.de Discussion http://forum.yacy.de Bugs http://bugs.yacy.net News Development http://twitter.com/#!/yacy_search https://gitorious.org/yacy http://blog.yacy.de http://blog.yacy-kochbuch.de all images are (CC0), many are from http://openclipart.org