A Comparison of Open Source Search Engines

Updated: sphinx setup wasn’t exactly ‘out of the box’. Sphinx searches the fastest now and its relevancy increased (charts updated below).

Motivation

Later this month we will be presenting a half day tutorial on Open Search at SIGIR. It’ll basically focus on how to use open source software and cloud services for building and quickly prototyping advanced search applications. Open Search isn’t just about building a Google-like search box on a free technology stack, but encouraging the community to extend and embrace search technology to improve the relevance of any application.

For example, one non-search application of BOSS leveraged the Spelling service to spell correct video comments before handing them off to their Spam filter. The Spelling correction process normalizes popular words that spammers intentionally misspell to get around spam models that rely on term statistics, and thus, can increase spam detection accuracy.

We have split up our upcoming talk into two sections:

  • Services: Open Search Web APIs (Yahoo! BOSS, Twitter, Bing, and Google AJAX Search), interesting mashup examples, ranking models and academic research that leverage or could benefit from such services.
  • Software: How to use popular open source packages for vertical indexing your own data.

While researching for the Software section, I was quite surprised by the number of open source vertical search solutions I found:

And I was even more surprised by the lack of comparisons between these solutions. Many of these platforms advertise their performance benchmarks, but they are in isolation, use different data sets, and seem to be more focused on speed as opposed to say relevance.

The best paper I could find that compared performance and relevance of many open source search engines was Middleton+Baeza’07, but the paper is quite old now and didn’t make its source code and data sets publicly available.

So, I developed a couple of fun, off the wall experiments to test (for building code examples – this is just a simple/quick evaluation and not for SIGIR – read disclaimer in the conclusion section) some of the popular vertical indexing solutions. Here’s a table of the platforms I selected to study, with some high level feature breakdowns:

High level feature comparison among the vertical search solutions I studied; The support rating and scale are based on information I collected from web sites and conversations (please feel free to comment).

High level feature comparison among the vertical search solutions I studied; The support rating and scale are based on information I collected from web sites and conversations. I tested each solution’s latest stable release as of this week (Indri is TODO).

One key design decision I made was not to change any numerical tuning parameters. I really wanted to test “Out of the Box” performance to simulate the common developer scenario. Plus, it takes forever to optimize parameters fairly across multiple platforms and different data sets esp. for an over-the-weekend benchmark (see disclaimer in the Conclusion section).

Also, I tried my best to write each experiment natively for each platform using the expected library routines or binary commands.

Twitter Experiment

For the first experiment, I wanted to see how well these platforms index Twitter data. Twitter is becoming very mainstream, and its real time nature and brevity differs greatly from traditional web content (which these search platforms are overall more tailored for) so its data should make for some interesting experiments.

So I proceeded to crawl Twitter to generate a sample data set. After about a full day and night, I had downloaded ~1M tweets (~10/second).

But before indexing, I did some quick analysis of my acquired Twitter data set:

# of Tweets: 968,937

Indexable Text Size (user, name, text message): 92MB

Average Tweet Size: 12 words

Types of Tweets based on simple word filters:

Out of a 1M sample, what kind of Tweet types do we find?

Out of a 1M sample, what types of Tweets do we find? Unique Users means that there were ~600k users that authored all of the 1M tweets in this sample.

Very interesting stats here – especially the high percentage of tweets that seem to be asking questions. Could Twitter (or an application) better serve this need?

Here’s a table comparing the indexing performance over this Twitter data set across the select vertical search solutions:

Indexing 1M twitter messages on a variety of open source search solutions; measuring time and space for each.

Indexing 1M twitter messages on a variety of open source search solutions.

Lucene was the only solution that produced an index that was smaller than the input data size. Shaves an additional 5 megabytes if one runs it in optimize mode, but at the consequence of adding another ten seconds to indexing. sphinx and zettair index the fastest. Interestingly, I ran zettair in big-and-fast mode (which sucks up 300+ megabytes of RAM) but it ran slower by 3 seconds (maybe because of the nature of tweets). Xapian ran 5x slower than sqlite (which stores the raw input data in addition to the index) and produced the largest index file sizes. The default index_text method in Xapian stores positional information, which blew the index size to 529 megabytes. One must use index_text_without_positions to make the size more reasonable. I checked my Xapian code against the examples and documentation to see if I was doing something wrong, but I couldn’t find any discrepancies. I also included a column about development issues I encountered. zettair was by far the easiest to use (simple command line) but required transforming the input data into a new format. I had some text issues with sqlite (also needs to be recompiled with FTS3 enabled) and sphinx given their strict input constraints. sphinx also requires a conf file which took some searching to find full examples of. Lucene, zettair, and Xapian were the most forgiving when it came to accepting text inputs (zero errors).

Measuring Relevancy: Medical Data Set

While this is a fun performance experiment for indexing short text, this test does not measure search performance and relevancy.

To measure relevancy, we need judgment data that tells us how relevant a document result is to a query. The best data set I could find that was publicly available for download (almost all of them require mailing in CD’s) was from the TREC-9 Filtering track, which provides a collection of 196,403 medical journal references – totaling ~300MB of indexable text (titles, authors, abstracts, keywords) with an average of 215 tokens per record. More importantly, this data set provides judgment data for 63 query-like tasks in the form of “” (2 is very relevant, 1 is somewhat relevant, 0 is not rated). An example task is “37 yr old man with sickle cell disease.” To turn this into a search benchmark, I treat these tasks as OR’ed queries. To measure relevancy, I compute the Average DCG across the 63 queries for results in positions 1-10.

Performance and Relevancy marks on the TREC OHSUMED Data Set; Lucene is the smallest, most relevant and fastest to search; Xapian is very close to Lucene on the search side but 3x slower on indexing and 4x bigger in index space; zettair is the fastest indexer.

Performance and Relevancy marks on the TREC-9 across select vertical search solutions.

With this larger data set (3x larger than the Twitter one), we see zettair’s indexing performance improve (makes sense as it’s more designed for larger corpora); zettair’s search speed should probably be a bit faster because its search command line utility prints some unnecessary stats. For multi-searching in sphinx, I developed a Java client (with the hopes of making it competitive with Lucene – the one to beat) which connects to the sphinx searchd server via a socket (that’s their API model in the examples). sphinx returned searches the fastest – ~3x faster than Lucene. Its indexing time was also on par with zettair. Lucene obtained the highest relevance and smallest index size. The index time could probably be improved by fiddling with its merge parameters, but I wanted to avoid numerical adjustments in this evaluation. Xapian has very similar search performance to Lucene but with significant indexing costs (both time and space > 3x). sqlite has the worst relevance because it doesn’t sort by relevance nor seem to provide an ORDER BY function to do so.

Conclusion & Downloads

Based on these preliminary results and anecdotal information I’ve collected from the web and people in the field (with more emphasis on the latter), I would probably recommend Lucene (which is an IR library – use a wrapper platform like Solr w/ Nutch if you need all the search dressings like snippets, crawlers, servlets) for many vertical search indexing applications – especially if you need something that runs decently well out of the box (as that’s what I’m mainly evaluating here) and community support.

Keep in mind that these experiments are still very early (done on a weekend budget) and can/should be improved greatly with bigger and better data sets, tuned implementations, and community support (I’d be the first one to say these are far from perfect, so I open sourced my code below). It’s pretty hard to make a benchmark that everybody likes (especially in this space where there haven’t really been many … and I’m starting to see why :)), not necessarily because there are always winners/losers and biases in benchmarks, but because there are so many different types of data sets and platform APIs and tuning parameters (at least databases support SQL!). This is just a start. I see this as a very evolutionary project that requires community support to get it right. Take the results here for what it’s worth and still run your own tuned benchmarks.

To encourage further search development and benchmarks, I’ve open sourced all the code here:

http://github.com/zooie/opensearch/tree/master

Happy to post any new and interesting results.

Source

Open Source Search Engine Software


SeekQuarry

Yioop Releases

The two most recent versions of Yioop are:

Payment Processing Script

Yioop software supports keyword advertising and supports charging credits to access groups. These can be enabled under the root account using the Server Settings activity. To create ads, advertisers purchase ad credits, and then use those credits to bid on keywords. When a user creates a group they can choose to charge a certain number of credits to join the group. By default, Yioop does not come with any payment processing mechanism for the purchase of credits, and so credits are essentially free. A script to enable purchasing credits for dollars on a credit card in Yioop 3.1.0 or higher using Stripe.com is a available from Seekquarry LLC for a fee of $30 at the link below:

Support Yioop

Seekquarry, LLC is a company owned by Chris Pollett, the principal developer of Yioop. If you like Yioop and would like to show support for this project, please consider making a contribution via either Paypal or Bitcoin:

PayPal - The safer, easier way to pay online!

PayPal – The safer, easier way to pay online!

Bitcoin Address: 1AHFYcbQX91FVf1D1ZTJ8At29Bdhz9jNV1

Installation

The Install Guides explain how to get Yioop to work in some common settings. The documentation page has information about the requirements of and installation procedure for Yioop.

Upgrading

Before upgrading, make sure to back up your data. Then download the latest version of Yioop and unzip it to the location you would like. Set the Search Engine Work Directory by the same method and to the same value as your old Yioop Installation. See the Installation section above for links to instructions on this, if you have forgotten how you did this. Knowing the old Work Directory location, should allow Yioop to complete the upgrade process.

Git Repository / Contributing Code

The Yioop git repository allows anonymous read-only access. If you would like to contribute to Yioop, just do a clone of the most recent code, make your changes, do a pull, and make a patch. For example, to clone the repository, assuming you have the git version control software installed, just type:

git clone https://seekquarry.com/git/yioop.git

The Yioop Coding Guidelines explain the form your code should be in when making a patch as well as how to create patches. You can create/update an issue in the Yioop issue tracker describing what your patch solves and upload your patch. To contribute localizations, you can use the GUI interface in your own copy of Yioop to enter in your localizations. Next locate in the locale folder of your Yioop work directory the locale tag of the language you added translations for. Within this folder is a configure.ini file, just make an issue in the issue tracker and upload this file there.

(c) 2020 Seekquarry, LLC – Open Source Search Engine Software. About Seekquarry .

Source

Best Educational Search Engines For Academic Researchers

Conducting academic research is a critical process. You cannot rely solely on the information you get on the web because some of the search results are non-relevant or not related to your topic. To ensure that you only gather genuine facts and credible data for your academic papers, check out only the most trusted and incredibly useful resources for your research.

Here’s a list of gratuitous and best academic search engines that can help you in your research journey.

Google Scholar

Google Scholar is a customized search engine specifically designed for students, educators and anyone related to academics. It allows users to find credible information, search journals, and save sources to their personal library. If you need help for your best essays, citations for your thesis and other researches, this easy-to-use resource can easily find citation-worthy materials for your academic writing.

iSEEK- Education

iSeek education is a go-to search engine for students, scholars and educators. It is one of the widely used search tools for academic research online. iSeek offers safe, smart, and reliable resources for your paper writing. Using this tool will help you save time, effort and energy in getting your written work done quickly.

Educational Resources Information Center – ERIC

ERIC is a comprehensive online digital library funded by Institute of Education Sciences of the U.S. Department of Education. It provides a database of education research and information for students, educators, librarians and the public. ERIC contains around 1.3 million articles and users can search for anything education-related such as journals, books, research papers, various reports, dissertations, policy papers, and other academic materials.

Virtual Learning Resources Center – VLRC

If you’re looking for high quality educational sites to explore? You must check out VLRC. This learning resource center is the best place to go when you’re in search for useful research materials and accurate information for your academic requirement. It has a collection of more than 10,000 indexed webpages for all subject areas.

Internet Archive

Internet Archive, a non-profit digital library, enables users to get free access to cultural artifacts and historical collections in digital format. It contains millions of free books, music, software, texts, audio, and moving images. Capturing, managing and searching different contents without any technical expertise or hosting facilities made easier for you through this search engine.

Infotopia

Infotopia is Google alternative safe search engine that gives information and reference sites on the following subjects: art, social sciences, history, languages, literature, science and technology and many more.

Can you recommend other search engines that can help researchers and scholars in their academic writing?

Source

Comet: An open‐source MS/MS sequence database search tool

Technical Brief

First published: 12 November 2012

https://doi.org/10.1002/pmic.201200439

Citations: 438

Abstract

Proteomics research routinely involves identifying peptides and proteins via MS /MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the C omet search engine to the existing landscape of commercial and open‐source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.

Citing Literature

Number of times cited according to CrossRef: 438

  • C. Martinez Calejman, S. Trefely, S. W. Entwisle, A. Luciano, S. M. Jung, W. Hsiao, A. Torres, C. M. Hung, H. Li, N. W. Snyder, J. Villén, K. E. Wellen, D. A. Guertin, mTORC2-AKT signaling to ATP-citrate lyase drives brown adipogenesis and de novo lipogenesis, Nature Communications, 10.1038/s41467-020-14430-w, 11, 1, (2020).

    Crossref

  • Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang, Yue Wang, Biological computing, Biomedical Information Technology, 10.1016/B978-0-12-816034-3.00003-1, (81-104), (2020).

    Crossref

  • Gabriel Ribas Pereira, Franciele Lucca de Lazari, Pedro Ferrari Dalberto, Cristiano Valim Bizarro, Elistone Rafael Sontag, Celso Koetz Junior, Silvio Renato Oliveira Menegassi, Júlio Otavio Jardim Barcellos, Ivan Cunha Bustamante-Filho, Effect of scrotal insulation on sperm quality and seminal plasma proteome of Brangus bulls, Theriogenology, 10.1016/j.theriogenology.2020.01.014, (2020).

    Crossref

  • Stephen Lu, Leticia A. da Rocha, Ricardo J.S. Torquato, Itabajara da Silva Vaz Junior, Monica Florin-Christensen, Aparecida S. Tanaka, A novel type 1 cystatin involved in the regulation of Rhipicephalus microplus midgut cysteine proteases, Ticks and Tick-borne Diseases, 10.1016/j.ttbdis.2020.101374, (101374), (2020).

    Crossref

  • Kurt M. Reichermeier, Ronny Straube, Justin M. Reitsma, Michael J. Sweredoski, Christopher M. Rose, Annie Moradian, Willem den Besten, Trent Hinkle, Erik Verschueren, Georg Petzold, Nicolas H. Thomä, Ingrid E. Wertz, Raymond J. Deshaies, Donald S. Kirkpatrick, PIKES Analysis Reveals Response to Degraders and Key Regulatory Mechanisms of the CRL4 Network, Molecular Cell, 10.1016/j.molcel.2019.12.013, (2020).

    Crossref

  • Larry Sai Weng Loo, Heidrun Vethe, Andreas Alvin Purnomo Soetedjo, Joao A. Paulo, Joanita Jasmen, Nicholas Jackson, Yngvild Bjørlykke, Ivan A. Valdez, Marc Vaudel, Harald Barsnes, Steven P. Gygi, Helge Ræder, Adrian Kee Keong Teo, Rohit N. Kulkarni, Dynamic proteome profiling of human pluripotent stem cell‐derived pancreatic progenitors, STEM CELLS, 10.1002/stem.3135, 38, 4, (542-555), (2020).

    Wiley Online Library

  • Katelyn Burleigh, Joanna H. Maltbaek, Stephanie Cambier, Richard Green, Michael Gale, Richard C. James, Daniel B. Stetson, Human DNA-PK activates a STING-independent DNA sensing pathway, Science Immunology, 10.1126/sciimmunol.aba4219, 5, 43, (eaba4219), (2020).

    Crossref

  • Wenguang Shao, Etienne Caron, Patrick Pedrioli, Ruedi Aebersold, The SysteMHC Atlas: a Computational Pipeline, a Website, and a Data Repository for Immunopeptidomic Analyses, Bioinformatics for Cancer Immunotherapy, 10.1007/978-1-0716-0327-7_12, (173-181), (2020).

    Crossref

  • Alban Ordureau, Joao A. Paulo, Jiuchun Zhang, Heeseon An, Kirby N. Swatek, Joe R. Cannon, Qiaoqiao Wan, David Komander, J. Wade Harper, Global Landscape and Dynamics of Parkin and USP30-Dependent Ubiquitylomes in iNeurons during Mitophagic Signaling, Molecular Cell, 10.1016/j.molcel.2019.11.013, 77, 5, (1124-1142.e10), (2020).

    Crossref

  • Chloe Chong, Markus Müller, HuiSong Pak, Dermot Harnett, Florian Huber, Delphine Grun, Marion Leleu, Aymeric Auger, Marion Arnaud, Brian J. Stevenson, Justine Michaux, Ilija Bilic, Antje Hirsekorn, Lorenzo Calviello, Laia Simó-Riudalbas, Evarist Planet, Jan Lubiński, Marta Bryśkiewicz, Maciej Wiznerowicz, Ioannis Xenarios, Lin Zhang, Didier Trono, Alexandre Harari, Uwe Ohler, George Coukos, Michal Bassani-Sternberg, Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes, Nature Communications, 10.1038/s41467-020-14968-9, 11, 1, (2020).

    Crossref

  • Rabah Gahoual, Yannis-Nicolas François, Nathalie Mignet, Pascal Houzé, Emerging biotechnological approaches with respect to tissue regeneration: from improving biomaterial incorporation to comprehensive omics monitoring, Biomaterials for Organ and Tissue Regeneration, 10.1016/B978-0-08-102906-0.00017-9, (83-112), (2020).

    Crossref

  • Simone Luti, Lorenzo Mazzoli, Matteo Ramazzotti, Viola Galli, Manuel Venturi, Giada Marino, Martin Lehmann, Simona Guerrini, Lisa Granchi, Paolo Paoli, Luigia Pazzagli, Antioxidant and anti-inflammatory properties of sourdoughs containing selected Lactobacilli strains are retained in breads, Food Chemistry, 10.1016/j.foodchem.2020.126710, (126710), (2020).

    Crossref

  • Mathias Walzer, Juan Antonio Vizcaíno, Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics, Mass Spectrometry Data Analysis in Proteomics, 10.1007/978-1-4939-9744-2_15, (345-371), (2020).

    Crossref

  • Jingjing Liu, Chunlin Hao, Long Wu, Wan Chan, Henry Lam, Proteomic analysis of thioproline misincorporation in Escherichia coli, Journal of Proteomics, 10.1016/j.jprot.2019.103541, 210, (103541), (2020).

    Crossref

  • Brian C. Searle, Kristian E. Swearingen, Christopher A. Barnes, Tobias Schmidt, Siegfried Gessulat, Bernhard Küster, Mathias Wilhelm, Generating high quality libraries for DIA MS with empirically corrected peptide predictions, Nature Communications, 10.1038/s41467-020-15346-1, 11, 1, (2020).

    Crossref

  • Chuan-Qi Zhong, Jianfeng Wu, Xingfeng Qiu, Xi Chen, Changchuan Xie, Jiahuai Han, Generation of a murine SWATH-MS spectral library to quantify more than 11,000 proteins, Scientific Data, 10.1038/s41597-020-0449-z, 7, 1, (2020).

    Crossref

  • Virgínia Campos Silvestrini, Carolina Hassibe Thomé, Daniele Albuquerque, Camila de Souza Palma, Germano Aguiar Ferreira, Guilherme Pauperio Lanfredi, Ana Paula Masson, Lara Elis Alberici Delsin, Fernanda Ursoli Ferreira, Felipe Canto de Souza, Lyris Martins Franco de Godoy, Adriano Aquino, Emanuel Carrilho, Rodrigo Alexandre Panepucci, Dimas Tadeu Covas, Vitor Marcel Faça, Proteomics analysis reveals the role of ubiquitin specific protease (USP47) in Epithelial to Mesenchymal Transition (EMT) induced by TGFβ2 in breast cells, Journal of Proteomics, 10.1016/j.jprot.2020.103734, 219, (103734), (2020).

    Crossref

  • Philip E. Johnson, Melanie L. Downs, Current Approaches in Quantitative Proteomics, Reference Module in Food Science, 10.1016/B978-0-08-100596-5.22755-8, (2020).

    Crossref

  • Emma Timmins-Schiffman, José M. Guzmán, Rhonda Elliott Thompson, Brent Vadopalas, Benoit Eudeline, Steven B. Roberts, Larval Geoduck (Panopea generosa) Proteomic Response to Ciliates, Scientific Reports, 10.1038/s41598-020-63218-x, 10, 1, (2020).

    Crossref

  • Sara Mirali, Aaron Botham, Veronique Voisin, Changjiang Xu, Jonathan St-Germain, David Sharon, Fieke W. Hoff, Yihua Qiu, Rose Hurren, Marcela Gronda, Yulia Jitkova, Boaz Nachmias, Neil MacLean, Xiaoming Wang, Andrea Arruda, Mark D. Minden, Terzah M. Horton, Steven M. Kornblau, Steven M. Chan, Gary D. Bader, Brian Raught, Aaron D. Schimmer, The mitochondrial peptidase, neurolysin, regulates respiratory chain supercomplex formation and is necessary for AML viability, Science Translational Medicine, 10.1126/scitranslmed.aaz8264, 12, 538, (eaaz8264), (2020).

    Crossref

  • Yong Chi, John H. Carter, Jherek Swanger, Alexander V. Mazin, Robert L. Moritz, Bruce E. Clurman, A novel landscape of nuclear human CDK2 substrates revealed by in situ phosphorylation, Science Advances, 10.1126/sciadv.aaz9899, 6, 16, (eaaz9899), (2020).

    Crossref

  • Mark A. Gillespie, Carmen G. Palii, Daniel Sanchez-Taltavull, Paul Shannon, William J.R. Longabaugh, Damien J. Downes, Karthi Sivaraman, Herbert M. Espinoza, Jim R. Hughes, Nathan D. Price, Theodore J. Perkins, Jeffrey A. Ranish, Marjorie Brand, Absolute Quantification of Transcription Factors Reveals Principles of Gene Regulation in Erythropoiesis, Molecular Cell, 10.1016/j.molcel.2020.03.031, (2020).

    Crossref

  • Xinru Wang, Dimitriya H Garvanska, Isha Nasa, Yumi Ueki, Gang Zhang, Arminja N Kettenbach, Wolfgang Peti, Jakob Nilsson, Rebecca Page, A dynamic charge-charge interaction modulates PP2A:B56 substrate recruitment, eLife, 10.7554/eLife.55966, 9, (2020).

    Crossref

  • Qing Yu, Joao A. Paulo, Jose Navarrete-Perea, Graeme C McAlister, Jesse D Canterbury, Derek J. Bailey, Aaron M Robitaille, Romain Huguet, Vlad Zabrouskov, Steven P. Gygi, Devin K. Schweppe, Benchmarking the Orbitrap Tribrid Eclipse for Next Generation Multiplexed Proteomics., Analytical Chemistry, 10.1021/acs.analchem.9b05685, (2020).

    Crossref

  • Jordy Evan Sulaiman, Henry Lam, Proteomic Investigation of Tolerant Escherichia coli Populations from Cyclic Antibiotic Treatment , Journal of Proteome Research, 10.1021/acs.jproteome.9b00687, (2020).

    Crossref

  • Annette M. Vogl, Lilian Phu, Raquel Becerra, Sebastian A. Giusti, Erik Verschueren, Trent B. Hinkle, Martín D. Bordenave, Max Adrian, Amy Heidersbach, Patricio Yankilevich, Fernando D. Stefani, Wolfgang Wurst, Casper C. Hoogenraad, Donald S. Kirkpatrick, Damian Refojo, Morgan Sheng, Global site-specific neddylation profiling reveals that NEDDylated cofilin regulates actin dynamics, Nature Structural & Molecular Biology, 10.1038/s41594-019-0370-3, (2020).

    Crossref

  • Julianus Pfeuffer, Timo Sachsenberg, Tjeerd M. H. Dijkstra, Oliver Serang, Knut Reinert, Oliver Kohlbacher, EPIFANY: A Method for Efficient High-Confidence Protein Inference, Journal of Proteome Research, 10.1021/acs.jproteome.9b00566, (2020).

    Crossref

  • Marietta Herrmann, Anne Babler, Irina Moshkova, Felix Gremse, Fabian Kiessling, Ulrike Kusebauch, Valentin Nelea, Rafael Kramann, Robert L. Moritz, Marc D. McKee, Willi Jahnen-Dechent, Lumenal calcification and microvasculopathy in fetuin-A-deficient mice lead to multiple organ morbidity, PLOS ONE, 10.1371/journal.pone.0228503, 15, 2, (e0228503), (2020).

    Crossref

  • Johra Muhammad Moosa, Shenheng Guan, Michael F. Moran, Bin Ma, Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification, Journal of Proteome Research, 10.1021/acs.jproteome.9b00555, (2020).

    Crossref

  • Kyung-Cho Cho, David J Clark, Michael Schnaubelt, Guo Ci Teo, Felipe da Veiga Leprevost, William Bocik, Emily Boja, Tara Hiltke, Alexey Nesvizhskii, Hui Zhang, Deep Proteomics using Two Dimensional Data Independent Acquisition Mass Spectrometry, Analytical Chemistry, 10.1021/acs.analchem.9b04418, (2020).

    Crossref

  • Devin K. Schweppe, Jimmy K. Eng, Qing Yu, Derek Bailey, Ramin Rad, Jose Navarrete-Perea, Edward L. Huttlin, Brian K Erickson, Joao A. Paulo, Steven P. Gygi, Full-featured, real-time database searching platform enables fast and accurate multiplexed quantitative proteomics., Journal of Proteome Research, 10.1021/acs.jproteome.9b00860, (2020).

    Crossref

  • Sheng Pan, Meredith A. J. Hullar, Lisa A. Lai, Hong Peng, Damon H. May, William S. Noble, Daniel Raftery, Sandi L. Navarro, Marian L. Neuhouser, Paul D. Lampe, Johanna W. Lampe, Ru Chen, Gut Microbial Protein Expression in Response to Dietary Patterns in a Controlled Feeding Study: A Metaproteomic Approach, Microorganisms, 10.3390/microorganisms8030379, 8, 3, (379), (2020).

    Crossref

  • Christian Montellese, Jasmin van den Heuvel, Caroline Ashiono, Kerstin Dörner, André Melnik, Stefanie Jonas, Ivo Zemp, Paola Picotti, Ludovic C Gillet, Ulrike Kutay, USP16 counteracts mono-ubiquitination of RPS27a and promotes maturation of the 40S ribosomal subunit, eLife, 10.7554/eLife.54435, 9, (2020).

    Crossref

  • Xi Wang, Adam C. Swensen, Tong Zhang, Paul D. Piehowski, Matthew J. Gaffrey, Matthew E. Monroe, Ying Zhu, Hailiang Dong, Wei-Jun Qian, Accurate Identification of Deamidation and Citrullination from Global Shotgun Proteomics Data Using a Dual-Search Delta Score Strategy, Journal of Proteome Research, 10.1021/acs.jproteome.9b00766, (2020).

    Crossref

  • William E. Fondrie, William S. Noble, A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments, Journal of Proteome Research, 10.1021/acs.jproteome.9b00780, (2020).

    Crossref

  • Jeremy P. Gygi, undefined Ramin Rad, Jose Navarrete-Perea, Simon Younesi, Steven P. Gygi, Joao A. Paulo, A Triple Knockout Isobaric-Labeling Quality Control Platform with an Integrated Online Database Search, Journal of the American Society for Mass Spectrometry, 10.1021/jasms.0c00029, (2020).

    Crossref

  • Jimmy K. Eng, Eric W. Deutsch, Extending Comet for Global Amino Acid Variant and Post‐Translational Modification Analysis Using the PSI Extended FASTA Format, PROTEOMICS, 10.1002/pmic.201900362, 0, 0, (2020).

    Wiley Online Library

  • Matthew Waas, Jack Littrell, Rebekah L. Gundry, CIRFESS: An Interactive Resource for Querying the Set of Theoretically Detectable Peptides for Cell Surface and Extracellular Enrichment Proteomic Studies, Journal of the American Society for Mass Spectrometry, 10.1021/jasms.0c00021, (2020).

    Crossref

  • Pavel Sulimov, Attila Kertesz-Farkas, Tailor: non-parametric and rapid score calibration method for database search-based peptide identification in shotgun proteomics, Journal of Proteome Research, 10.1021/acs.jproteome.9b00736, (2020).

    Crossref

  • Thomas Kruse, Sebastian Peter Gnosa, Isha Nasa, Dimitriya Hristoforova Garvanska, Jamin B Hein, Hieu Nguyen, Jacob Samsøe‐Petersen, Blanca Lopez‐Mendez, Emil Peter Thrane Hertz, Jeanette Schwarz, Hanna Sofia Pena, Denise Nikodemus, Marie Kveiborg, Arminja N Kettenbach, Jakob Nilsson, Mechanisms of site‐specific dephosphorylation and kinase opposition imposed by PP2A regulatory subunits, The EMBO Journal, 10.15252/embj.2019103695, 0, 0, (2020).

    Wiley Online Library

  • Łukasz Pawliński, Ewa Tobór, Maciej Suski, Maria Biela, Anna Polus, Beata Kieć-Wilk, Proteomic biomarkers in Gaucher disease, Journal of Clinical Pathology, 10.1136/jclinpath-2020-206580, (jclinpath-2020-206580), (2020).

    Crossref

  • Seyed Omid Sajedi, Xiao Liang, Vibration‐based semantic damage segmentation for large‐scale structural health monitoring, Computer-Aided Civil and Infrastructure Engineering, 10.1111/mice.12523, 35, 6, (579-596), (2019).

    Wiley Online Library

  • Eric D. Merkley, David S. Wunschel, Karen L. Wahl, Kristin H. Jarman, Applications and Challenges of Forensic Proteomics, Forensic Science International, 10.1016/j.forsciint.2019.01.022, (2019).

    Crossref

  • Anna Georges, Etienne Coyaud, Edyta Marcon, Jack Greenblatt, Brian Raught, Lori Frappier, USP7 Regulates Cytokinesis through FBXO38 and KIF20B, Scientific Reports, 10.1038/s41598-019-39368-y, 9, 1, (2019).

    Crossref

  • Marina Amaral Xavier, Lucas Tirloni, Antonio F.M. Pinto, Jolene K. Diedrich, John R. Yates, Sergio Gonzales, Marisa Farber, Itabajara da Silva Vaz, Carlos Termignoni, Tick gené’s organ engagement in lipid metabolism revealed by a combined transcriptomic and proteomic approach, Ticks and Tick-borne Diseases, 10.1016/j.ttbdis.2019.03.013, (2019).

    Crossref

  • Dario Amodei, Jarrett Egertson, Brendan X. MacLean, Richard Johnson, Gennifer E. Merrihew, Austin Keller, Don Marsh, Olga Vitek, Parag Mallick, Michael J. MacCoss, Improving Precursor Selectivity in Data-Independent Acquisition Using Overlapping Windows, Journal of The American Society for Mass Spectrometry, 10.1007/s13361-018-2122-8, 30, 4, (669-684), (2019).

    Crossref

  • Diego Martínez-López, Emilio Camafeita, Lídia Cedó, Raquel Roldan-Montero, Inmaculada Jorge, Fernando García-Marqués, María Gómez-Serrano, Elena Bonzon-Kulichenko, Francisco Blanco-Vaca, Luis Miguel Blanco-Colio, Jean-Baptiste Michel, Joan Carles Escola-Gil, Jesús Vázquez, Jose Luis Martin-Ventura, APOA1 oxidation is associated to dysfunctional high-density lipoproteins in human abdominal aortic aneurysm, EBioMedicine, 10.1016/j.ebiom.2019.04.012, (2019).

    Crossref

  • Giselle Villa Flor Brunoro, Paulo Costa Carvalho, Valmir C. Barbosa, Dante Pagnoncelli, Claudia Vitória De Moura Gallo, Jonas Perales, René Peiman Zahedi, Richard Hemmi Valente, Ana Gisele da Costa Neves-Ferreira, Differential proteomic comparison of breast cancer secretome using a quantitative paired analysis workflow, BMC Cancer, 10.1186/s12885-019-5547-y, 19, 1, (2019).

    Crossref

  • Lotte VW Stagsted, Katrine M Nielsen, Iben Daugaard, Thomas B Hansen, Noncoding AUG circRNAs constitute an abundant and conserved subclass of circles, Life Science Alliance, 10.26508/lsa.201900398, 2, 3, (e201900398), (2019).

    Crossref

  • Markus W. Löffler, Christopher Mohr, Leon Bichmann, Lena Katharina Freudenmann, Mathias Walzer, Christopher M. Schroeder, Nico Trautwein, Franz J. Hilke, Raphael S. Zinser, Lena Mühlenbruch, Daniel J. Kowalewski, Heiko Schuster, Marc Sturm, Jakob Matthes, Olaf Riess, Stefan Czemmel, Sven Nahnsen, Ingmar Königsrainer, Karolin Thiel, Silvio Nadalin, Stefan Beckert, Hans Bösmüller, Falko Fend, Ana Velic, Boris Maček, Sebastian P. Haen, Luigi Buonaguro, Oliver Kohlbacher, Stefan Stevanović, Alfred Königsrainer, Hans-Georg Rammensee, Multi-omics discovery of exome-derived neoantigens in hepatocellular carcinoma, Genome Medicine, 10.1186/s13073-019-0636-8, 11, 1, (2019).

    Crossref

  • Steven Moreira, Caleb Seo, Victor Gordon, Sansi Xing, Ruilin Wu, Enio Polena, Vincent Fung, Deborah Ng, Cassandra J. Wong, Brett Larsen, Brian Raught, Anne-Claude Gingras, Yu Lu, Bradley W. Doble, Endogenous Bioid Elucidates TCF7L1 Interactome Modulation Upon GSK-3 Inhibition in Mouse ESCs, SSRN Electronic Journal, 10.2139/ssrn.3348349, (2019).

    Crossref

  • Maiwen Caudron-Herger, Scott F. Rusin, Mark E. Adamo, Jeanette Seiler, Vera K. Schmid, Elsa Barreau, Arminja N. Kettenbach, Sven Diederichs, R-DeeP: Proteome-wide and Quantitative Identification of RNA-Dependent Proteins by Density Gradient Ultracentrifugation, Molecular Cell, 10.1016/j.molcel.2019.04.018, (2019).

    Crossref

  • Wael L. Demian, Avinash Persaud, Chong Jiang, Étienne Coyaud, Shixuan Liu, Andras Kapus, Ran Kafri, Brian Raught, Daniela Rotin, The Ion Transporter NKCC1 Links Cell Volume to Cell Mass Regulation by Suppressing mTORC1, Cell Reports, 10.1016/j.celrep.2019.04.034, 27, 6, (1886-1896.e6), (2019).

    Crossref

  • Samuel J. Carpentier, Minjian Ni, Jeffrey M. Duggan, Richard G. James, Brad T. Cookson, Jessica A. Hamerman, The signaling adaptor BCAP inhibits NLRP3 and NLRC4 inflammasome activation in macrophages through interactions with Flightless-1, Science Signaling, 10.1126/scisignal.aau0615, 12, 581, (eaau0615), (2019).

    Crossref

  • André Zelanis, Débora A. Silva, Eduardo S. Kitano, Tarcísio Liberato, Isabella Fukushima, Solange M.T. Serrano, Alexandre K. Tashima, A first step towards building spectral libraries as complementary tools for snake venom proteome/peptidome studies, Comparative Biochemistry and Physiology Part D: Genomics and Proteomics, 10.1016/j.cbd.2019.100599, (100599), (2019).

    Crossref

  • Helisa Helena Wippel, Juliane Soldi Malgarin, Alexandre Haruo Inoue, Felipe da Veiga Leprevost, Paulo Costa Carvalho, Samuel Goldenberg, Lysangela Ronalte Alves, Unveiling the partners of the DRBD2-mRNP complex, an RBP in Trypanosoma cruzi and ortholog to the yeast SR-protein Gbp2, BMC Microbiology, 10.1186/s12866-019-1505-8, 19, 1, (2019).

    Crossref

  • Wanping Xu, Kristin Beebe, Juan D. Chavez, Marta Boysen, YinYing Lu, Abbey D. Zuehlke, Dimitra Keramisanou, Jane B. Trepel, Christosomos Prodromou, Matthias P. Mayer, James E. Bruce, Ioannis Gelis, Len Neckers, Hsp90 middle domain phosphorylation initiates a complex conformational program to recruit the ATPase-stimulating cochaperone Aha1, Nature Communications, 10.1038/s41467-019-10463-y, 10, 1, (2019).

    Crossref

  • Paul A. Stewart, Eric A. Welsh, Robbert J. C. Slebos, Bin Fang, Victoria Izumi, Matthew Chambers, Guolin Zhang, Ling Cen, Fredrik Pettersson, Yonghong Zhang, Zhihua Chen, Chia-Ho Cheng, Ram Thapa, Zachary Thompson, Katherine M. Fellows, Jewel M. Francis, James J. Saller, Tania Mesa, Chaomei Zhang, Sean Yoder, Gina M. DeNicola, Amer A. Beg, Theresa A. Boyle, Jamie K. Teer, Yian Ann Chen, John M. Koomen, Steven A. Eschrich, Eric B. Haura, Proteogenomic landscape of squamous cell lung cancer, Nature Communications, 10.1038/s41467-019-11452-x, 10, 1, (2019).

    Crossref

  • Zach Rolfs, Markus Müller, Michael R. Shortreed, Lloyd M. Smith, Michal Bassani-Sternberg, Comment on “A subset of HLA-I peptides are not genomically templated: Evidence for cis- and trans-spliced peptide ligands”, Science Immunology, 10.1126/sciimmunol.aaw1622, 4, 38, (eaaw1622), (2019).

    Crossref

  • Rosane Oliveira Nunes, Giselli Abrahão Domiciano, Wilber Sousa Alves, Ana Claudia Amaral Melo, Fábio Cesar Sousa Nogueira, Luciano Pasqualoto Canellas, Fábio Lopes Olivares, Russolina Benedeta Zingali, Márcia Regina Soares, Evaluation of the effects of humic acids on maize root architecture by label-free proteomics analysis, Scientific Reports, 10.1038/s41598-019-48509-2, 9, 1, (2019).

    Crossref

  • Aivett Bilbao, Proteomics Mass Spectrometry Data Analysis Tools, Encyclopedia of Bioinformatics and Computational Biology, 10.1016/B978-0-12-809633-8.20274-4, (84-95), (2019).

    Crossref

  • Bradley Smith, Daniel Martins-de-Souza, Mariana Fioramonte, Using Co-immunoprecipitation and Shotgun Mass Spectrometry for Protein-Protein Interaction Identification in Cultured Human Oligodendrocytes, Co-Immunoprecipitation Methods for Brain Tissue, 10.1007/978-1-4939-8985-0_4, (37-47), (2019).

    Crossref

  • Juan D. Chavez, Andrew Keller, Bo Zhou, Rong Tian, James E. Bruce, Cellular Interactome Dynamics during Paclitaxel Treatment, Cell Reports, 10.1016/j.celrep.2019.10.063, 29, 8, (2371-2383.e5), (2019).

    Crossref

  • Hannah E. Opalko, Isha Nasa, Arminja N. Kettenbach, James B. Moseley, A mechanism for how Cdr1/Nim1 kinase promotes mitotic entry by inhibiting Wee1, Molecular Biology of the Cell, 10.1091/mbc.E19-08-0430, 30, 25, (3015-3023), (2019).

    Crossref

  • Malgorzata Krajewska, Ruben Dries, Andrew V. Grassetti, Sofia Dust, Yang Gao, Hao Huang, Bandana Sharma, Daniel S. Day, Nicholas Kwiatkowski, Monica Pomaville, Oliver Dodd, Edmond Chipumuro, Tinghu Zhang, Arno L. Greenleaf, Guo-Cheng Yuan, Nathanael S. Gray, Richard A. Young, Matthias Geyer, Scott A. Gerber, Rani E. George, CDK12 loss in cancer cells affects DNA damage response genes through premature cleavage and polyadenylation, Nature Communications, 10.1038/s41467-019-09703-y, 10, 1, (2019).

    Crossref

  • Piotr Grabowski, Sebastian Hesse, Sebastian Hollizeck, Meino Rohlfs, Uta Behrends, Roya Sherkat, Hannah Tamary, Ekrem Ünal, Raz Somech, Türkan Patıroğlu, Stefan Canzar, Jutte van der Werff Ten Bosch, Christoph Klein, Juri Rappsilber, Proteome Analysis of Human Neutrophil Granulocytes From Patients With Monogenic Disease Using Data-independent Acquisition, Molecular & Cellular Proteomics, 10.1074/mcp.RA118.001141, 18, 4, (760-772), (2019).

    Crossref

  • Henning Schiebenhoefer, Tim Van Den Bossche, Stephan Fuchs, Bernhard Y. Renard, Thilo Muth, Lennart Martens, Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis, Expert Review of Proteomics, 10.1080/14789450.2019.1609944, 16, 5, (375-390), (2019).

    Crossref

  • Dylan C. Mitchell, Arya Menon, Amanda L. Garner, Chemoproteomic Profiling Uncovers CDK4-Mediated Phosphorylation of the Translational Suppressor 4E-BP1, Cell Chemical Biology, 10.1016/j.chembiol.2019.03.012, (2019).

    Crossref

  • Derek F. Ceccarelli, Sofiia Ivantsiv, Amber Anne Mullin, Etienne Coyaud, Noah Manczyk, Pierre Maisonneuve, Igor Kurinov, Liang Zhao, Chris Go, Anne-Claude Gingras, Brian Raught, Sabine Cordes, Frank Sicheri, FAM105A/OTULINL Is a Pseudodebuiquitinase of the OTU-Class that Localizes to the ER Membrane, Structure, 10.1016/j.str.2019.03.022, (2019).

    Crossref

  • Birgit Schilling, Jesse G. Meyer, Lei Wei, Melanie Ott, Eric Verdin, Mark Helle, High-Resolution Mass Spectrometry to Identify and Quantify Acetylation Protein Targets, Psychotherapie, 10.1007/978-1-4939-9434-2_1, (3-16), (2019).

    Crossref

  • Wenguang Shao, Tiannan Guo, Nora C. Toussaint, Peng Xue, Ulrich Wagner, Li Li, Konstantina Charmpi, Yi Zhu, Jianmin Wu, Marija Buljan, Rui Sun, Dorothea Rutishauser, Thomas Hermanns, Christian Daniel Fankhauser, Cedric Poyet, Jelena Ljubicic, Niels Rupp, Jan H. Rüschoff, Qing Zhong, Andreas Beyer, Jiafu Ji, Ben C. Collins, Yansheng Liu, Gunnar Rätsch, Peter J. Wild, Ruedi Aebersold, Comparative analysis of mRNA and protein degradation in prostate tissues indicates high stability of proteins, Nature Communications, 10.1038/s41467-019-10513-5, 10, 1, (2019).

    Crossref

  • Sabine Amon, Fabienne Meier-Abt, Ludovic C. Gillet, Slavica Dimitrieva, Alexandre P. A. Theocharides, Markus G. Manz, Ruedi Aebersold, Sensitive Quantitative Proteomics of Human Hematopoietic Stem and Progenitor Cells by Data-independent Acquisition Mass Spectrometry, Molecular & Cellular Proteomics, 10.1074/mcp.TIR119.001431, 18, 7, (1454-1467), (2019).

    Crossref

  • Jorge Ruiz-Orera, M Mar Albà, Conserved regions in long non-coding RNAs contain abundant translation and protein–RNA interaction signatures, NAR Genomics and Bioinformatics, 10.1093/nargab/lqz002, 1, 1, (e2-e2), (2019).

    Crossref

  • Hyunwoo Kim, Sangjeong Lee, Heejin Park, Target-small decoy search strategy for false discovery rate estimation, BMC Bioinformatics, 10.1186/s12859-019-3034-8, 20, 1, (2019).

    Crossref

  • Magdalena Zasada, Maciej Suski, Renata Bokiniec, Monika Szwarc-Duma, Maria Katarzyna Borszewska-Kornacka, Józef Madej, Beata Bujak-Giżycka, Anna Madetko-Talowska, Cecilie Revhaug, Lars O. Baumbusch, Ola D. Saugstad, Jacek Józef Pietrzyk, Przemko Kwinta, Comparative two time-point proteome analysis of the plasma from preterm infants with and without bronchopulmonary dysplasia, Italian Journal of Pediatrics, 10.1186/s13052-019-0676-0, 45, 1, (2019).

    Crossref

  • Chi Zhou, Chenyu Zhu, Qi Liu, Toward in silico Identification of Tumor Neoantigens in Immunotherapy, Trends in Molecular Medicine, 10.1016/j.molmed.2019.08.001, (2019).

    Crossref

  • Yumi Ueki, Thomas Kruse, Melanie Bianca Weisser, Gustav N. Sundell, Marie Sofie Yoo Larsen, Blanca Lopez Mendez, Nicole P. Jenkins, Dimitriya H. Garvanska, Lauren Cressey, Gang Zhang, Norman Davey, Guillermo Montoya, Ylva Ivarsson, Arminja N. Kettenbach, Jakob Nilsson, A Consensus Binding Motif for the PP4 Protein Phosphatase, Molecular Cell, 10.1016/j.molcel.2019.08.029, (2019).

    Crossref

  • Scott Frendo-Cumbo, Javier R. Jaldin-Fincati, Etienne Coyaud, Estelle M. N. Laurent, Logan K. Townsend, Joel M. J. Tan, Ramnik J. Xavier, Nicolas J. Pillon, Brian Raught, David C. Wright, John Hunter Brumell, Amira Klip, Deficiency of the autophagy gene ATG16L1 induces insulin resistance through KLHL9/KLHL13/CUL3-mediated IRS1 degradation, Journal of Biological Chemistry, 10.1074/jbc.RA119.009110, 294, 44, (16172-16185), (2019).

    Crossref

  • Matthew D. Berg, Yanrui Zhu, Julie Genereaux, Bianca Y. Ruiz, Ricard A. Rodriguez-Mias, Tyler Allan, Alexander Bahcheli, Judit Villén, Christopher J. Brandl, Modulating Mistranslation Potential of tRNA Ser in Saccharomyces cerevisiae , Genetics, 10.1534/genetics.119.302525, 213, 3, (849-863), (2019).

    Crossref

  • Colin D. Gottlieb, Airlia C. S. Thompson, Alban Ordureau, J. Wade Harper, Ron R. Kopito, Acute unfolding of a single protein immediately stimulates recruitment of ubiquitin protein ligase E3C (UBE3C) to 26S proteasomes, Journal of Biological Chemistry, 10.1074/jbc.RA119.009654, 294, 45, (16511-16524), (2019).

    Crossref

  • Lovaine Silva Duarte, Laísa Quadros Barsé, Pedro Ferrari Dalberto, William Tadeu Santos da Silva, Rafael Costa Rodrigues, Pablo Machado, Luiz Augusto Basso, Cristiano Valim Bizarro, Marco Antônio Záchia Ayub, Cloning and expression of the Bacillus amyloliquefaciens transglutaminase gene in E. coli using a bicistronic vector construction, Enzyme and Microbial Technology, 10.1016/j.enzmictec.2019.109468, (109468), (2019).

    Crossref

  • Mónica Carrera, Jesús Mateos, José M. Gallardo, Data Treatment in Food Proteomics, Reference Module in Food Science, 10.1016/B978-0-08-100596-5.22907-7, (2019).

    Crossref

  • Alexander Rabe, Manuela Gesell Salazar, Stephan Michalik, Stephan Fuchs, Alexander Welk, Thomas Kocher, Uwe Völker, Metaproteomics analysis of microbial diversity of human saliva and tongue dorsum in young healthy individuals, Journal of Oral Microbiology, 10.1080/20002297.2019.1654786, 11, 1, (1654786), (2019).

    Crossref

  • Monica Losada-Barragán, Adriana Umaña-Pérez, Andrés Rodriguez-Vega, Sergio Cuervo-Escobar, Renata Azevedo, Fernanda N. Morgado, Vinicius de Frias Carvalho, Priscila Aquino, Paulo C. Carvalho, Renato Porrozzi, Myriam Sánchez-Gómez, Gabriel Padron, Patricia Cuervo, Proteomic profiling of splenic interstitial fluid of malnourished mice infected with Leishmania infantum reveals defects on cell proliferation and pro-inflammatory response, Journal of Proteomics, 10.1016/j.jprot.2019.103492, 208, (103492), (2019).

    Crossref

  • Amol Prakash, Swetaketu Majumder, Shadab Ahmad, Manu Varkey, T.A. Anish, Conor Jenkins, Megan Rigby, Benjamin Orsburn, Detection and verification of 2.3 million cancer mutations in NCI60 cancer cell lines with a cloud search engine, Journal of Proteomics, 10.1016/j.jprot.2019.103488, 209, (103488), (2019).

    Crossref

  • Nishant Pappireddi, Lance Martin, Martin Wühr, A Review on Quantitative Multiplexed Proteomics, ChemBioChem, 10.1002/cbic.201800650, 20, 10, (1210-1224), (2019).

    Wiley Online Library

  • Adam J. Rabalski, Jon D. Williams, Ryan A. McClure, Anil Vasudevan, Aleksandra Baranczak, A Dual‐Purpose Bromocoumarin Tag Enables Deep Profiling of the Cellular Cysteinome, PROTEOMICS, 10.1002/pmic.201800433, 19, 11, (2019).

    Wiley Online Library

  • Rachel L. Spietz, Rachel A. Lundeen, Xiaowei Zhao, Daniela Nicastro, Anitra E. Ingalls, Robert M. Morris, Heterotrophic carbon metabolism and energy acquisition in Candidatus Thioglobus singularis strain PS1, a member of the SUP05 clade of marine Gammaproteobacteria, Environmental Microbiology, 10.1111/1462-2920.14623, 21, 7, (2391-2401), (2019).

    Wiley Online Library

  • Mario Leutert, Ricard A Rodríguez‐Mias, Noelle K Fukuda, Judit Villén, R2‐P2 rapid‐robotic phosphoproteomics enables multidimensional cell signaling studies, Molecular Systems Biology, 10.15252/msb.20199021, 15, 12, (2019).

    Wiley Online Library

  • Yan Lu, Yuping Zheng, Étienne Coyaud, Chao Zhang, Apiraam Selvabaskaran, Yuyun Yu, Zizhen Xu, Xialian Weng, Ji Shun Chen, Ying Meng, Neil Warner, Xiawei Cheng, Yangyang Liu, Bingpeng Yao, Hu Hu, Zonping Xia, Aleixo M. Muise, Amira Klip, John H. Brumell, Stephen E. Girardin, Songmin Ying, Gregory D. Fairn, Brian Raught, Qiming Sun, Dante Neculai, Palmitoylation of NOD1 and NOD2 is required for bacterial sensing, Science, 10.1126/science.aau6391, 366, 6464, (460-467), (2019).

    Crossref

  • Jo Ishizawa, Sarah F. Zarabi, R. Eric Davis, Ondrej Halgas, Takenobu Nii, Yulia Jitkova, Ran Zhao, Jonathan St-Germain, Lauren E. Heese, Grace Egan, Vivian R. Ruvolo, Samir H. Barghout, Yuki Nishida, Rose Hurren, Wencai Ma, Marcela Gronda, Todd Link, Keith Wong, Mark Mabanglo, Kensuke Kojima, Gautam Borthakur, Neil MacLean, Man Chun John Ma, Andrew B. Leber, Mark D. Minden, Walid Houry, Hagop Kantarjian, Martin Stogniew, Brian Raught, Emil F. Pai, Aaron D. Schimmer, Michael Andreeff, Mitochondrial ClpP-Mediated Proteolysis Induces Selective Cancer Cell Lethality, Cancer Cell, 10.1016/j.ccell.2019.03.014, (2019).

    Crossref

  • Hong Peng, Ru Chen, Teresa A. Brentnall, Jimmy K. Eng, Vincent J. Picozzi, Sheng Pan, Predictive proteomic signatures for response of pancreatic cancer patients receiving chemotherapy, Clinical Proteomics, 10.1186/s12014-019-9251-3, 16, 1, (2019).

    Crossref

  • Aitor Alvarez-Fernandez, Kirill Borziak, Grant C. McDonald, Steve Dorus, Tommaso Pizzari, Female novelty and male status dynamically modulate ejaculate expenditure and seminal fluid proteome over successive matings in red junglefowl, Scientific Reports, 10.1038/s41598-019-41336-5, 9, 1, (2019).

    Crossref

  • Scott E. Lindner, Kristian E. Swearingen, Melanie J. Shears, Michael P. Walker, Erin N. Vrana, Kevin J. Hart, Allen M. Minns, Photini Sinnis, Robert L. Moritz, Stefan H. I. Kappe, Transcriptomics and proteomics reveal two waves of translational repression during the maturation of malaria parasite sporozoites, Nature Communications, 10.1038/s41467-019-12936-6, 10, 1, (2019).

    Crossref

  • Dasha Krayushkina, Emma Timmins-Schiffman, Jessica Faux, Damon H. May, Michael Riffle, H. Rodger Harvey, Brook L. Nunn, Growth phase proteomics of the heterotrophic marine bacterium Ruegeria pomeroyi, Scientific Data, 10.1038/s41597-019-0308-y, 6, 1, (2019).

    Crossref

  • Mak A. Saito, Erin M. Bertrand, Megan E. Duffy, David A. Gaylord, Noelle A. Held, William Judson Hervey, Robert L. Hettich, Pratik D. Jagtap, Michael G. Janech, Danie B. Kinkade, Dagmar H. Leary, Matthew R. McIlvin, Eli K. Moore, Robert M. Morris, Benjamin A. Neely, Brook L. Nunn, Jaclyn K. Saunders, Adam I. Shepherd, Nicholas I. Symmonds, David A. Walsh, Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing, Journal of Proteome Research, 10.1021/acs.jproteome.8b00761, (2019).

    Crossref

  • Mu A, Tak Shun Fung, Arminja N. Kettenbach, Rajarshi Chakrabarti, Henry N. Higgs, A complex containing lysine-acetylated actin inhibits the formin INF2, Nature Cell Biology, 10.1038/s41556-019-0307-4, (2019).

    Crossref

  • Kai-Ting Fan, Kuo-Hsin Wang, Wei-Hung Chang, Jhih-Ci Yang, Ching-Fang Yeh, Kai-Tan Cheng, Sheng-Chi Hung, Yet-Ran Chen, Application of Data-Independent Acquisition Approach to Study the Proteome Change from Early to Later Phases of Tomato Pathogenesis Responses, International Journal of Molecular Sciences, 10.3390/ijms20040863, 20, 4, (863), (2019).

    Crossref

  • Justyna Fert-Bober, Vidya Venkatraman, Christie L Hunter, Ruining Liu, Erin L Crowgey, Rakhi Pandey, Ronald Joseph Holewinski, Aleksandr Stotland, Benjamin P Berman, Jennifer E. Van Eyk, Mapping citrullinated sites in multiple organs of mice using hyper-citrullinated library, Journal of Proteome Research, 10.1021/acs.jproteome.9b00118, (2019).

    Crossref

  • Yasuhiro Saito, Lewyn Li, Etienne Coyaud, Augustin Luna, Chris Sander, Brian Raught, John M. Asara, Myles Brown, Senthil K. Muthuswamy, LLGL2 rescues nutrient stress by promoting leucine uptake in ER+ breast cancer, Nature, 10.1038/s41586-019-1126-2, (2019).

    Crossref

  • See more

Close Figure Viewer

Previous FigureNext Figure

Caption

Source

Open Source Search Engine Battle: Solr vs Elasticsearch

We have now entered the era of a massive growth in data and cloud and this discussion is going to be pretty exciting. Applications these days generate petabytes and zettabytes of data without compromising on the speed and performance of the systems.

On top of that, when data piles up massively, searching information by steering quickly through them becomes quite a substantial back end challenge.

In this post, we shall discuss the distinct features of Solr vs Elasticsearch, the open source search engines that are gaining popularity these days.

At the end of this article, the reader will understand the different functionalities of both the search engines, and gain a fair detail insight on their individual behavior so as to decide which one to go for (as choosing one of them is not an easy task).

Origin and Building Mechanism

When it comes to Elasticsearch vs Solr, it is a fact that both were built on an open source Java library, Apache Lucene, due to which their behavior and core features are identical. Apache Lucene is a very dependable and widely deployed search engine packaged together in a group of jar files. It was first established in the year, 1991 and later in the year 2001 it became an open source project of Apache.

Apache Solr

Apache Solr, being an enterprise search platform offers search capabilities of Lucene in a very user-friendly manner. Released in the year 2008, the committers of Solr focused completely on building new search features. Gradually, distributed search mechanism became a highly desired feature.

In October 2012, SolrCloud feature was introduced, which was supposed to ease the process of distributed search. Now, Solr is high on demand and is also supported by an Apache Community comprised of 100 developers and code committers.

Elasticsearch

Elasticsearch, on the other hand, is not a product of Apache Software Foundation like Solr and Lucene. It was launched in the year 2010, just after a few years of the launch of Solr and is based at Github, a commercial software hosting service. However, it is licensed under Apache 2.0 and is an opensource distributed RESTful search engine.

The best feature of Elasticsearch is its multitenant capability, a full-text search that comes with an HTTP web interface and schema-free JSON documents.

It includes indices that can be split into shards and each one of them can have multiple replicas. Each node of Elasticsearch can contain one or more shards, and its engine also plays the role of a coordinator to assign an operation to the correct shard(s).

Elasticsearch wins this argument (Elasticsearch vs Solr) because of the major intention behind designing Elasticsearch was to fix the loopholes left in the distributed features of Solr. Hence the user might find it easier to start up an Elasticsearch cluster than that of Solr.

Major differences in features- Solr vs Elasticsearch:

Apache Solr Elasticsearch
Full-text search Distributed search
Highlighting Multi-tenancy
Faceted search Analyzer chain
Real-time indexing Analytical search
Dynamic clustering Grouping and aggregation
Database Integration  
NoSQL features and rich document handling  

 Let us look into the various grounds to understand deeply about the battle of Solr vs Elasticsearch:

1.) Coordination

Elasticsearch uses its cluster handling mechanism through an inbuilt coordination mechanism, whereas Solr uses Zookeeper. This means in order to work with SolrCloud, the user needs a Zookeeper quorum setup. People who are already using the components of Hadoop ecosystem won’t have any problem as they most likely would be having a Zookeeper quorum setup already.

2.) Shard Splitting and Rebalancing

In the chapter of Elasticsearch vs Solr, both share the Shards system feature. It is nothing but the partitioning unit for the Linux index. The user can distribute his/her index by placing the shards in a cluster on different machines. Since April 2013, Solr has supported shard splitting which allows the user to create more number of shards from the existing shards. Elasticsearch doesn’t have this feature.

However, in order to make the current system ready for sharding and addition of more machines, the user needs to have multiple shards in that machine by splitting the index based upon the estimated count of machines needed in the future.

Here, the advantage is that all the machines would be having multiple shards and when the requirement for the addition of new machines comes, Elasticsearch automatically balances the load and relocates the shards to the newly formed nodes in the cluster. The automatic shard rebalancing feature doesn’t exist in Solr.

3.) Community

Solr consists of a broad, open-source community and, hence, stands ahead in the Elasticsearch vs Solr battle. Anyone who wants to contribute to Solr can do it without any hassle, and the election of new Solr developers or code committers is held based on merit only.

Elasticsearch can be called as an open-source platform, but not completely. All its contributors have access to the source code. The users can make changes and contribute them as well. But the final changes are confirmed and done by the employees of Elastic (the company behind Elasticsearch).

This makes it clear that Elasticsearch is driven more by a single company rather than a whole community.

4.) Documentation

Solr stands unrivaled in this category. It is a perfectly documented product with clear contexts and examples of API use cases. Elasticsearch’s documentation is undoubtedly well organized, but it falls short of good examples and clear configuration instructions.

Solr vs Elasticsearch – Which is the most popular engine these days?

(This graph, clearly shows the popularity that Elasticsearch is gaining over time.)

Which one should I go for among Solr vs Elasticsearch?

Well, to be honest Elasticsearch would be an ideal pick for newer developers due to its easy-to-use nature. But, if you are already using Solr, then you better stick to it because there is no big advantage in switching to Elasticsearch.

If you are dealing with analytical queries with searching text, you better go for Elasticsearch, as Solr has search mechanism only.

Elasticsearch will always be the better choice for cloud and distributed environments that need precision in scalability and good performance. Hence, if distributed indexing is what you need, you must go for Elasticsearch.

Both the search engines have their unique features and functionalities, and now it is not hard to say which one among Solr vs Elasticsearch suits your needs.

Source

Home | krugle – software development productivity


Loading…

Copyright © . Aragon Consulting Group, Inc.

Request Form – Free Download of Krugle Basic

Fill out the form to download Krugle Basic V5.

First name: *

general_err_msg

Last name: *

general_err_msg

Company:

Email Address *

email_err_msg

Phone: *

general_err_msg

Request Form – Krugle Enterprise Evaluation

Complete this form for more information about evaluating Krugle Enterprise

First name: *

general_err_msg

Last name: *

general_err_msg

Company:

Email Address *

email_err_msg

Phone: *

general_err_msg



Source

Open Source Search Engine Upgrades



Cambridge, Massachusetts – (Website Hosting Directory) – May 29, 2008 – Multilingual natural language processing technologies firm, and SAS subsidiary, Teragram, has integrated its linguistic suite into the open source search platform, Apache Lucene.

Lucene is a high-performance, full-featured text search engine library which powers sites such as Wikipedia, CNET Reviews and Eventful.com. The integration enables Lucene users to add taxonomies and faceted search to their web sites, as well as to correct the spelling of queries and search in multiple languages. Using Teragram’s solutions, enterprises deploying Lucene can provide site searchers with a functionality that parallels popular enterprise search platforms on the market today.

Dr. Yves Schabes, President and co-founder of Teragram noted, ”Lucene is expanding its user base to high-profile corporate and consumer-facing websites around the world, and quickly becoming the open source alternative to traditional enterprise search. We’re happy to provide Lucene users with language processing enhancements so they can meet the high-performance standards of traditional enterprise search engines, while still enjoying the freedom of the open source experience.”

Teragram’s full suite of products, including TK240 taxonomy management, automatic categorization and automatic metadata generation enhance Lucene’s basic search function. As a result, adopters of the open source search engine (for embedded search on both internal and external-facing sites) can provide consumers with a more comprehensive experience that includes a searchable index. The faceted index is built using Teragram’s taxonomy tools that classify words into relevant topics and sub-topics.

Teragram’s multilingual natural language processing tools provide linguistic modules such as morphological stemming, spelling correction, parts-of-speech tagging, and related queries, among other tools. Teragram also offers dictionaries in a variety of Eastern and Western European, Asian and Middle Eastern languages.

Teragram, a SAS company, provides mobile and multilingual natural language processing technologies that use the meaning of text to distill relevant information from vast amounts of data. Founded in 1997 by innovators in the field of computational linguistics, Teragram alone offers the speed, accuracy and global language support that customers and partners demand to retrieve and organize growing volumes of digital information. Teragram helps customers perform more efficient searches and better organize information in more than 30 languages, enabling them to reach new markets and make better decisions. Teragram serves customers across the publishing, pharmaceutical, telecommunications and financial industries, including Ariba, Ask.com, Associated Press, CNN, Factiva, EBSCO Publishing, Ebay, FAST Search and Transfer, Forbes.com, InfoSpace, NYTimes Digital, OneSource, Reed Business Information, Ricoh, Sony, Verity, WashingtonPost.com, Wolters Kluwer, the World Bank and Yahoo!

Since 1976, SAS has provided business intelligence and analytical software and services. Customers at more than 44,000 sites use SAS software to improve performance through insight from data, resulting in faster, more accurate business decisions; more profitable relationships with customers and suppliers; compliance with governmental regulations; research breakthroughs; and better products and processes. Only SAS offers leading data integration, storage, analytics and business intelligence applications within a comprehensive enterprise intelligence platform.

To learn more, please visit: www.teragram.com/info.

Research inexpensive low cost hosting and budget services and resources.

 

May 29, 2008

 

bookmark this article

Twitter del.icio.us Digg Furl Newsvine Netscape Reddit StumbleUpon Technorati Squidoo Windows Live Yahoo! My Web Google Bookmarks Slashdot

Source

Open Source Blog – Page 2 of 15

11 October 2019 – Nutch 1.16 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.16, we advise all current users and developers of the 1.X series to upgrade to this release.

An account of the CHANGES in this release can be seen in the release report. Breaking changes are listed in the changelog.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our downloads page.

11 October 2019 – Nutch 2.4 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.4, we advise all current users and developers of the 2.X series to upgrade to this release.

This release contains 81 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 2.X series, release artifacts are made available as only source and also available within Maven Central as a Maven dependency. The release is available from our downloads page.

We expect that v2.4 is the last release on the 2.X series. We’ve decided to freeze the development on the 2.X branch for now, as no committer is actively working on it.

26 July 2019 – Nutch Wiki Migrated

The Apache Nutch wiki has been migrated from MoinMoin to Confluence.

9 August 2018 – Nutch 1.15 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.15, we advise all current users and developers of the 1.X series to upgrade to this release.

An account of the CHANGES in this release can be seen in the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

23 December 2017 – Nutch 1.14 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.14, we advise all current users and developers of the 1.X series to upgrade to this release.

An account of the CHANGES in this release can be seen in the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

02 April 2017 – Nutch 1.13 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.13, we advise all current users and developers of the 1.X series to upgrade to this release.

An account of the CHANGES in this release can be seen in the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

18 June 2016 – Nutch 1.12 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.12, we advise all current users and developers of the 1.X series to upgrade to this release.

This release is the result of many months of work and over 40 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

21 January 2016 – Nutch 2.3.1 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.3.1, we advise all current users and developers of the 2.X series to upgrade to this release.

This bug fix release contains around 40 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 2.X series, release artifacts are made available as only source and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

The recommended Gora backends for this Nutch release are

  • Apache Avro 1.7.6
  • Apache Hadoop 1.2.1 and 2.5.2
  • Apache HBase 0.98.8-hadoop2 (although also tested with 1.X)
  • Apache Cassandra 2.0.2
  • Apache Solr 4.10.3
  • MongoDB 2.6.X
  • Apache Accumlo 1.5.1
  • Apache Spark 1.4.1

Thank you to everyone that contributed towards this release.

07 December 2015 – Nutch 1.11 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.11, we advise all current users and developers of the 1.X series to upgrade to this release.

This release is the result of many months of work and around 100 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

06 May 2015 – Nutch 1.10 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.10, we advise all current users and developers of the 1.X series to upgrade to this release.

This release is the result of many months of work and well over 100 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 1.X series, release artifacts are made available as both source and binary and also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

23 April 2015 – Apache Nutch Reaches 2000th Jira Issue

22 January 2015 – Nutch 2.3 Release

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.3, we advise all current users and developers of the 2.X series to upgrade to this release. After successful completion of the first Nutch Google Summer of Code project we are pleased to announce that Nutch 2.3 release now comes packaged with a self contained Apache Wicket-based Web Application.

This release is the result of many months of work and 143 issues addressed. For a complete overview of these issues please see the release report.

As usual in the 2.x series, this release is made available only as source, but is also available within Maven Central as a Maven dependency. The release is available from our DOWNLOADS PAGE.

The supported Apache Gora v0.5 backends are;

Please note that the SQL backend for Gora has been deprecated.

22 September 2014 – Wicket WebApp now part of Nutch 2.x Codebase

Apache Wicket Logo

After successful completion of the first Nutch Google Summer of Code project we are pleased to announce that Nutch 2.X branch now comes packaged with a self contained Apache Wicket-based Web Application.

This not only greatly lowers the barrier for direct interaction with the Nutch 2.X REST API but also provides a stepping stone from which we intend to backport this work to the Nutch 1.X (trunk) series.

Some of the Web Application features include:

  • Functionality to dynamically load seed URLs in order to bootstrap Nutch crawls
  • Browsable and dynamic editing of Configuration overrides
  • Complete REST API documentation and UML model describing REST API calls, Administration and Job and Configuration Management.

The new Web Application feature will be present within the upcoming Nutch 2.3 Release.

16 August 2014 – Apache Nutch v1.9 Released

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.9, we advise all current users and developers of the 1.X series to upgrade to this release. This release addressed no fewer than 55 issues in total. Please see the list of changes for a full breakdown, or see the release report. As usual in the 1.X series, this release is made available both as source and binary. Additionally developers can find Maven artifacts within Maven Central. The release is available here.

31 July 2014 – Nutch tutorial at upcoming ApacheCon Europe in Budapest

ACEU

The upcoming ApacheCon Europe in Budapest, November 17 – 21, 2014, will offer a one-day Nutch tutorial. Topics will span from Nutch installation and configuration up to plugin development. Both Nutch 1.x and 2.x are covered. The conference is a good opportunity to bring together both users and committers of Nutch and related projects.

01 May 2014 – Apache Nutch Participates in Google Summer of Code

GSoC Logo

For the first time in Nutch project history, we are participating as part of Apache’s mentoring efforts in the ever popular Google Summer of Code program. This years project involves the creation of a Apache Wicket-based Web Application for Nutch 2.X branch.

Keep your eyes peeled and check here for updates as the project progresses throughout the summer.

07-09 April 2014 – Nutch at ApacheCon 2014, Denver Colorado

ApacheCon Logo

lots of talk and loads of exposure for this at ApacheCon NA 2014 in the beautiful city of Denver, CO. This year one presentation focused on Building your Big Data Search Stack with Apache Nutch 2.x. You can see presentation slides below and follow the audio (sorry no video) here

.

17 March 2014 – Apache Nutch v1.8 Released

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.8, we advise all current users and developers of the 1.X series to upgrade to this release. Alhough this release includes library upgrades to Crawler Commons 0.3 and Apache Tika 1.5, it also provides over 30 bug fixes as well as 18 improvements. Please see the list of changes for a full breakdown, or see the release report. As usual in the 1.X series, this release is made available both as source and binary. Additionally developers can find Maven artifacts within Maven Central. The release is available here.

02 July 2013 – Apache Nutch v2.2.1 Released

The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v2.2.1, we advise all current users and developers of the 2.X series to upgrade to this release ASAP. Although this release includes library upgrades to Apache Hadoop 1.2.0 and Apache Tika 1.3, it is predominantly a bug fix for NUTCH-1591 – Incorrect conversion of ByteBuffer to String. Please see the list of changes for a full breakdown, or see the release report. As usual in the 2.x series, this release is made available only as source, but is also available within Maven Central. The release is available here.

24th June 2013 – Apache Nutch v1.7 Released

The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v1.7. This release includes over 20 bug fixes, as many improvements; most noticeably featuring a new pluggable indexing architecture which currently supports Apache Solr and Elastic Search. Shadowing the recent Nutch 2.2 release, parsing of Robots.txt is now delegated to Crawler-Commons. Key library upgrades have been made to Apache Hadoop 1.2.0 and Apache Tika 1.3. Please see the list of changes or the release report made in this version for a full breakdown. As usual in the 1.x series, the release is made available as binary and source (zip + tar.gz) and is also available within Maven Central. The release is available here.

08 June 2013 – Apache Nutch v2.2 Released

The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v2.2. This release includes over 30 bug fixes and over 25 improvements representing the third release of increasingly popular 2.x Nutch series. This release features inclusion of Crawler-Commons which Nutch now utilizes for improved robots.txt parsing, library upgrades to Apache Hadoop 1.1.1, Apache Gora 0.3, Apache Tika 1.2 and Automaton 1.11-8. Please see the list of changes or the release report made in this version for a full breakdown. As usual in the 2.x series, this release is made available only as source, but is also available within Maven Central. The release is available here.

06 December 2012 – Apache Nutch v1.6 Released

The Apache Nutch PMC are extremely pleased to announce the release of Apache Nutch v1.6. This release includes over 20 bug fixes, the same in improvements, as well as new functionalities including a new HostNormalizer, the ability to dynamically set fetchInterval by MIME-type and functional enhancements to the Indexer API inluding the normalization of URL’s and the deletion of robots noIndex documents. Other notable improvements include the upgrade of key dependencies to Tika 1.2 and Automaton 1.11-8. Please see the list of changes or the release report made in this version for a full breakdown. The release is available here.

05 October 2012 – Apache Nutch v2.1 Released

The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.1. This release continues to provide Nutch users with a simplified Nutch distribution building on the 2.x development drive which is growing in popularity amongst the community. As well as addressing ~20 bugs this release also offers improved properties for better Solr configuration, upgrades to various Gora dependencies and the introduction of the option to build indexes in elastic search. Please see the list of changes made in this version for a full breakdown. The release is available here.

10 August 2012 – Happy 10th Birthday Apache Nutch!!

It’s official, Apache Nutch is now a decade old! The project has come a long long way since inception, through acceptance into the Apache Incubator way back in Janurary 2005, to the Top Level Project it became on 21st April 2010. Happy birthday Nutch and thanks to all contributors past and present! See Doug Cutting’s tweet.

10 July 2012 – Apache Nutch v1.5.1 Released

The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v1.5.1. This release is a maintainence release of the popular 1.5.X mainstream version of Nutch which has been widely adopted within the community. Please see the list of changes made in this version for a full breakdown. The release is available here.

07 July 2012 – Apache Nutch v2.0 Released

The Apache Nutch PMC are very pleased to announce the release of Apache Nutch v2.0. This release offers users an edition focused on large scale crawling which builds on storage abstraction (via Apache Gora™) for big data stores such as Apache Accumulo™, Apache Avro™, Apache Cassandra™, Apache HBase™, HDFS™, an in memory data store and various high profile SQL stores. After some two years of development Nutch v2.0 also offers all of the mainstream Nutch functionality and it builds on Apache Solr™ adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika™ for HTML and an array other document formats. Nutch v2.0 shadows the latest stable mainstream release (v1.5.X) based on Apache Hadoop™ and covers many use cases from small crawls on a single machine to large scale deployments on Hadoop clusters. Please see the list of changes made in this version for a full breakdown. The release is available here.

07 June 2012 – Apache Nutch 1.5 Released

The 1.5 release of Nutch is now available. This release includes several improvements including upgrades of several major components including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and WebGraph elements as well as a number of new plugins covering blacklisting, filering and parsing to name a few. Please see the list of changes made in this version for a full breakdown of the 50 odd improvements the release boasts. The release is available here.

26 November 2011 – Apache Nutch 1.4 Released

The 1.4 release of Nutch is now available. This release includes several improvements including allowing Parsers to declare support for multiple MIME types, configurable Fetcher Queue depth, Fetcher speed improvements, tigther Tika integration, and support for HTTP auth in Solr indexing. Please see the list of changes made in this version. The release is available here.

23 September 2011 – Apache Nutch focuses on 1.x series for main development

After some discussion and a vote about the issue, the Nutch development community decided to focus their efforts on maintaining and releasing the 1.x series of Nutch, and to branch the now former Nutch trunk based on Gora, allowing others to try and improve it, while the mainline development goes on.

7 June 2011 – Apache Nutch 1.3 Released

The 1.3 release of Nutch is now available. This release includes several improvements (improved RSS parsing support, tighter integration with Apache Tika, external parsing support, improved language identification and an order of magnitude smaller source release tarball — only about 2MB!). Please see the list of changes made in this version. The release is available here.

24 September 2010 – Apache Nutch 1.2 Released

The 1.2 release of Nutch is now available. This release includes several improvements (addition of parse-html as a selectable parser again, configurable per-field indexing),
new features (including adding timing information to all Tool classes, and implementation of parser timeouts), and bug fixes (fixing an NPE in distributed search, fixing of XML formatting issues per Document fields). Please see the list of changes made in this version. The release is available here.

06 June 2010 – Apache Nutch 1.1 Released

The 1.1 release of Nutch is now available. This release includes several major upgrades of existing libraries (Hadoop, Solr, Tika, etc.) on which Nutch depends. Various bug fixes, and speedups (e.g., to Fetcher2) have also been included. See list of changes made in this version. The release is available here.

21 April 2010 – Apache Nutch graduates to TLP

Passed by unanimous approval of the Apache Board, Nutch graduated to TLP status. We are in the process of updating the website, and moving things around, so if you notice anything out of place, please let us know.

14 August 2009 – Lucene at US ApacheCon

ApacheCon Logo ApacheCon US is once again in the Bay Area and Lucene is coming along for the ride! The Lucene community has planned two full days of talks, plus a meetup and the usual bevy of training. With a well-balanced mix of first time and veteran ApacheCon speakers, the Lucene track at ApacheCon US promises to have something for everyone. Be sure not to miss:

Training:

Thursday, Nov. 5th

Friday, Nov. 6th

23 March 2009 – Apache Nutch 1.0 Released

The 1.0 release of Nutch is now available. This release includes several major feature improvements such as new indexing framework, new scoring framework, Apache Solr integration just to mention a few. See list of changes made in this version. The release is available here.

09 February 2009 – Lucene at ApacheCon Europe 2009 in Amsterdam

ApacheCon EU 2009 Logo Lucene will be extremely well represented at ApacheCon EU 2009 in Amsterdam, Netherlands this March 23-27, 2009:

2 April 2007: Nutch 0.9 Released

The 0.9 release of Nutch is now available. This is the second release of Nutch based entirely on the underlying Hadoop platform. This release includes several critical bug fixes, as well as key speedups described in more detail at Sami Siren’s blog. See list of changes made in this version. The release is available here.

24 September 2006: Nutch 0.8.1 Released

The 0.8.1 release of Nutch is now available. This is a maintenance release to 0.8 branch fixing many serous bugs found in version 0.8. See list of changes made in this version. The release is available here.

25 July 2006: Nutch 0.8 Released

The 0.8 release of Nutch is now available. This is the first release of Nutch based on hadoop architecure. See CHANGES.txt for list of changes made in this version. The release is available here.

31 March 2006: Nutch 0.7.2 Released

The 0.7.2 release of Nutch is now available. This is a bug fix release for 0.7 branch. See CHANGES.txt for details. The release is available here.

1 October 2005: Nutch 0.7.1 Released

The 0.7.1 release of Nutch is now available. This is a bug fix release. See CHANGES.txt for details. The release is available here.

17 August 2005: Nutch 0.7 Released

This is the first Nutch release as an Apache Lucene sub-project. See CHANGES.txt for details. The release is available here.

June 2005: Nutch graduates from Incubator

Nutch has now graduated from the Apache incubator, and is now a Subproject of Lucene.

January 2005: Nutch Joins Apache Incubator

Nutch is a two-year-old open source project, previously hosted at Sourceforge and backed by its own non-profit organization. The non-profit was founded in order to assign copyright, so that we could retain the right to change the license. We have now determined that the Apache license is the appropriate license for Nutch and no longer require the overhead of an independent non-profit organization. Nutch’s board of directors and its developers were both polled and supported the move to the Apache foundation.

September 2004: Creative Commons launches Nutch-based Search

Creative Commons unveiled a beta version of its search engine, which scours the web for text, images, audio, and video free to re-use on certain terms a search refinement offered by no other company or organization.

See the Creative Commons Press Release for more details.

September 2004: Oregon State University switches to Nutch

Oregon State University is converting its searching infrastructure from Googletm to the open source project Nutch. The effort to replace the Googletm will realize significant cost savings for Oregon State University, while promoting both the Nutch Search Engine and transparency in search engine use and management.

For more details see the announcement by OSU’s Open Source Lab.

Photo Attributions

The Apache Nutch site was constructed using several photo’s fetched from Flickr using Nutch. These photo’s are licensed under the Creative Commons Attribution-ShareAlike 2.0 Generic.

The photos are as follows

Source

Source

Google Alternative: 12 Best Search Engine to Use in 2019

Google has centered itself as the undisputed leader in the online industry and at least some part of our daily online activity is dependent on Google’s services whether that’s Google Chrome, Google Search, YouTube or anything else. Probably the most used Google service on the planet is Google search. Google Search has captured more than 92% of the market share which means billions of people are using it on a daily basis. That gives Google too much control and power and it utilizes it by capturing an enormous amount of data on its users. If you don’t want any part of it, you should use a Google Search alternative. With that in mind, we have created a list of 12 best search engines that you can use as Google alternative.

Why the need for Google Search Alternative?

Although Google is the de facto industry standard for web search and is also used as a verb for web search engine, it has got some kinks to it. The major concern has to do with privacy as being an ad company at its core, Google continuously collects data on its users. Google also has numerous accusations for manipulating search results in their own favor.

Yelp even went on to hire Tim Wu, the father of net neutrality to prove that Google’s search results are biased. Also, Google has been displaying a major portion of the search results bombarded with ads. The death of white space and ad-infused Google is not what many online users would wish for. If you value your online privacy and enjoy an ad-free experience, it would do you good to look at some of the best Google Alternatives that you can use.

One thing to note here before we get into our list is that the alternatives mentioned below mainly focuses on general search engines. If you looking for specific search engines such as people search engines and reverse image search engines, you should click on the links to check them out.

Best Search Engines to Use as Google Alternative in 2019

So, whatever your case might be, if you are looking for a better alternative to the ubiquitous Google Search, here are the 12 Best Google Alternatives.

1. Bing

Although Bing is nowhere as big as it once was, it still remains one of the best Google alternatives on the market right now. Not only it brings tons of features but it also looks good. Bing’s homepage has an ever-changing background consisting of places, animals, people, sports, etc. Some of the key capabilities of Bing include the ability to use operating calculations, quick sports scores, flight tracking, products shopping, translate, conversions, spell check and more.

Bing Search Engine

Bing does feature Bing Ads, Bing Events, Bing Finance and more according to the task at hand. However, you can set preferences for most of them and they are not as annoying as Google Search ads. Bing also integrates easily with Facebook and into Apple and Windows-based devices. Also featuring its own standalone mobile applications, Bing is easily a viable Google alternative.

Visit Website

2. DuckDuckGo

DuckDuckGo is one of the fastest growing web search engines, which has gained particularly because of its plans on maintaining user privacy. DuckDuckGo aggregates its results from many different sources and it does not keep track of your searches. DuckDuckGo aggregates results from over 100 different sources including DuckDuck bot which is its own web crawler, various  crowd-sourced sites, Bing, Yandex and more. It then displays them privately to the end user. This is entirely open-source and the code is even available on GitHub.

DuckDuckGo

DuckDuckGo features strictly one-ad-a-page revenue model. Its proxy based search engine meant that the user’s search queries are left untracked. It also features a Voice Search. All in all, DuckDuckGo quickly gained attention from users who were not willing to sacrifice their privacy on the web. Recently, Mozilla Firefox has been added with DuckDuckGo as a search option for the user.

Visit Website

3. Search Encrypt

Search Encrypt takes online tracking prevention to the next level by not only blocking online trackers but also using local encryption to secure your searches. It uses both industry standard AES-256 bit encryption along with Secure Sockets Layer encryption for total protection. That means not only your searches and other web activities are secure from online snoopers but they are also not available to local users who have access to your computer. Whether you share your computer with someone or it just gets stolen, you can be sure that your internet activities will be accessible to no one.

3. Search Encrypt

Search Encrypt also offers privacy protected maps and video searches. The company is using Open Street Maps as its maps provider so your details are not being shared with Google. It also allows you to see videos in an enhanced privacy mode which blocks pre-roll ads while protecting your privacy. Of course, with all these privacy standards, you will not get search results which are as good as Google Search. Still, its a viable Google alternative for anyone who is looking for extreme privacy.

Visit Website

4. Qwant

Qwant is yet another-privacy focused search engine which promises never to save your search data or harvest your personal data for ad targeting. Despite being privacy-friendly, Quant is quite rich in features. One of my favorite features of Quant is its “quick search shortcuts” feature which lets you quickly search for products and content on specific websites like YouTube, Amazon, and more.

4. Qwant

For example, I can type the keyword “&a” and type my search query after that. When I hit enter, Quant will use my search query to directly search Amazon’s catalog. I also love Quant’s Panoramic search feature which basically means that Qwant delivers all its results on a single web page whether they are websites, social networks, pictures, videos, shopping, music, and more. Overall, I quite like Qwant and its search results are also quite accurate. You should definitely check it out.

5. Yahoo! Search

After playing trials with different search engines to power their own Web search, Yahoo! have now partnered up with Microsoft to use Bing search results for their web engine. Now powered entirely by Bing, Yahoo! Search provides access in up to 38 International languages..

Yahoo! Search

It doesn’t make sense for normal users to hand over all of their online data to Google, and Yahoo! Search does feel like a good Google alternative. Yahoo Answers and Yahoo Finance are a wealth of information on niche topics and now the recent purchase and integration of Flickr made them even with Google on the Image front. Yahoo! still offers a better privacy to their users and Yahoo Local and Yahoo Weather are other most often used services.

Visit Website

6. Wolfram Alpha

If you are under the impression that Wolfram Alpha is just for the Math geeks, think again. Although it is primarily a computational algorithm mechanism, it is also a powerful search engine. Wolfram Alpha mainly curates its data, instead of just caching web pages. This search engine curates data from a lot of reputed and trustworthy college publications/libraries, Crunchbase, FAA, Best Buy and many other sources.

6. WolfForm Alpha

Wolfram Alpha comes up with results which are computational facts. On the home screen are some of the examples of searches through which Wolfram Alpha could assist in. If you look up a University on Wolfram Alpha, it curates all of the key information like the Enrollment numbers/Tuition Fees/location, etc. and all the essential data curated and presented in a single spot. While I understand that this will not serve as a viable Google Search alternative for many users, it is in fact better than Google if you meet its user criteria.

Visit Website

7. StartPage

IxQuick was one of the few search engines on the market which showed its own results on the page and didn’t send the query to another search engine. Later, IxQuick launched a second search engine called StartPage which used to include Google’s search results but didn’t allow Google to tracks its users. Finally, both these search engines merged into one and now operate under the same name. With this merger, users’ are supposedly getting the best of both worlds.

7. StartPage.com

On one hand, you are not being tracked while on the other and you are receiving accurate search results as they are being pulled directly from Google Search. None of this is illegal as StartPage is paying Google to access its search results and removing the trackers. StartPage neither stores users’ data nor lets websites track them. It brings a feature called “Anonymous View” which protects users from websites when they click on any search result. If you want privacy but don’t want to deal with sub-par search results, you should be using StartPage.

Visit Website

8. Yandex

Yandex is a Russian-based company providing Search and other such services on the web. With over 150 Million search queries operated per day by Yandex, it is one of the largest Web Search engine in the World and the leading search engine in Russia. While Yandex collects user’s data, the company is quite transparent and let users know what kind of data they are collecting how they are using it. The company assures that your data is not accessible by individuals and all your private data like passwords are encrypted. Yandex

Yandex provides its users with lots of services like Images, Videos, Mail, Maps, Metrica (Equivalent of Google Analytics) and Yandex browser; in addition to its Mobile apps, Yandex Disk (Cloud storage), Translate, Market, Money and more. These full-fledged services offered by Yandex make it easily one of the best alternatives to Google. If you want to remove Google completely from your life, you can surely look at Yandex as an alternative.

Visit Website

9. Dogpile

Dogpile is one of the oldest web search engines to curate information, links, images and videos from other search engines. Dogpile curates results for your search terms by fetching links from Google, Yahoo, Yandex and other such services. Although initially it fetched links from AskJeeves (now Ask.com) and Bing, it has now went on to add more web engines to fetch links, videos and images from.

Dogpile

Some of the key features offered by Dogpile include Categories, White pages, Preferences, Search filters, Recent searches, favorite fetches and more. Dogpile also has its own toolbar for Internet Explorer and Mozilla Firefox, which provides users an alternative search from their web browser.

Visit Website

10. Gibiru

While all of the above Google Alternative services tackled the issue of Privacy, Gibiru takes on Censored content. Do you happen to know that most of the content you look up online is presented after the removal of censored content? Gibiru pulls up all search results, including the ones which are censored for the general audience. While doing so, the issue of privacy is also well-tackled through its anonymous proxy search engine.

11. Gibiru

Gibiru crawls mainstream media for your search query and presents the uncensored results to the end user. Providing complete privacy and uncensored content, Gibiru is by far the best Google Alternative as far as Internet activists are concerned. Gibiru also has a Mozilla Firefox extension, to make the search for Uncensored content painless.

Visit Website

11. Ask.com

Ask.com is still more of a question-answer community rather than a full-blown search engine, but you could find answers to a wide variety of search queries here. While Ask.com closed its doors on Web search in 2009 to become completely focused on its original mission of providing a Questions-and-Answer community, it seems that it is including search results again.

Ask.com

You can find topics ranging from Art & Literature, Geography, Education and Politics to Technology, Science and Business queries answered here. Ask.com is one of the great alternatives to Google in the sense of finding human-edited content that is strictly to the point and is better organized.

Visit Website

12 Internet Archive

Although technically not a web search engine, Internet Archive does let users search for iterations of a website in the past. You can check how a website looked in the past, of your choice of selected date. Apart from just browsing through older iterations of websites, the Internet Archive is also a great source for millions of public books, images, software, movies, videos and much more.

Internet Archive

You get unprecedented access to all of these resources for free on Internet Archive. Some of the classic movies and novels are up for grabs via Internet Archive. This non-profit digital library is a member of International Preservation Consortium, and this network crawls the web and archives valuable pieces of information. I see this one as a Google alternative for users who want access to data that are hard to find on Google itself.

Visit Website

BONUS

1. YouTube for Video Search

While a lot was talked about web search engines, where you enter a text manually, Google’s YouTube is by far the most popular website for searching videos on the web. The second largest search engine on the web, only second to Google, YouTube has  the biggest collection of videos. You may also look into other Video search alternatives like Yahoo View, Facebook Watch, Vimeo, and more for your Video searches.

2. TinEye for Search by Images

For Reverse searching of images, that is, to search for content by uploading an image, TinEye is one of the best services. You can also find some alternatives to TinEye by visiting our best reverse image search engines and mobile applications article.

3. Search for Mobile

Out of the services listed above, many of them do offer their own standalone mobile applications. You can make use of these applications to search directly from your mobile application. You can find both Android and iOS mobile applications for DuckDuckGo, Wolfram Alpha, Yahoo! Search, Bing, Yandex, Ask Mobile, Dogpile and more.

SEE ALSO: Google Drive Alternatives: 10 Best Cloud Storage Services

Are Your Happy with These Google Alternative Search Engines?

Now that we’ve come to the end of the best Google Alternative article, how many of these search engines have caught your attention? Do check them out and let us know which Google alternative search engine are you going to use. Also, if we missed any search engine which should be on the list but isn’t, let us know in the comments section down below.

Source