Elasticsearch vs. Solr – Choosing Your Open Source Search Engine

Update: In 2018, our search expert revisited this popular “Elasticsearch vs. Solr” topic and offered new observations incorporating cloud, analytics, and cognitive search. Check out our post here.

Why are we here? What is the purpose of my existence?  Should I exercise or rest and save my energy? Wake up early for work or start late and work through the night? Should I eat my french fries with ketchup or mayonnaise? 

These are all age-old questions that may or may not have answers. Some of them are very hard or terribly subjective. But let me put a bit of effort into trying to answer one of them: Should I use Elasticsearch or Solr?

Here is the scenario. Your organization is looking to implement your first search engine, switch to another search engine – calling out to all the Google Search Appliance (GSA) users looking for a replacement! – or try to save money by moving to open source. You, as a proficient and capable developer, have been called to solve a difficult problem. Your problem has many business requirements, but at the core, it is a “big data and search” problem. 

You need to extract a lot of content from multiple data sources and get insights from that data to help your company grow and achieve their objectives for this year.

 

One Shot

There is a lot at stake here. You can’t miss and you have only one shot. You need the right search engine for the job, you are thinking open source, and you have two popular choices: Elasticsearch or Solr, both of which are steadily ranked in the top two spots among open source and commercial search engines, according to DB-Engines

 

Which Open Source Search Engine Would You Pick?

This is not a coin toss or an easy pick. Both search engines are great and there is no one “right” choice. It all depends on your requirements. 

So the first step is to understand what application you have to build. Then, the next step is to see what each search engine has to offer. And by the way, if you’re still at the intersection of open source vs. commercial solutions, get our free e-book for a deep-dive into the 10 key criteria to consider when selecting a search engine.

 

Feature Rundown

A couple of years back, we wrote a high-level overview blog on Elasticsearch vs. Solr, which discussed overall trends and non-technical insights. Now, as both Elasticsearch and have evolved and become dominant players in the open source search engine market, let’s take another fresh look at each and see where it takes us.

 

Age & maturity

In this case, we can say that Solr has a longer history as it was created in 2004 by Yonik Seely at CNET Networks, which then contributed it to Apache in 2006. It finally graduated to a top-level project in 2007. On the other hand, we have Elasticsearch, which was officially created in 2010, although it was really started in 2001 by its founder Shay Bannon under the name of Compass. Since then, the creators of Kibana, Logstash, and Beats have joined Elasticsearch to create the Elastic Stack product family, which has emerged as a powerful player in the search and log analytics space. With that said, Solr has an advantage of being visible in the market at an earlier date.

 

Community & open source

Both have very active communities. If you check Github, you can see that they are very popular open source projects with plenty of releases.

apache lucene solr github

elasticsearch github

 

A very important detail is that while both are released under the Apache license, and both are open source, they work a little differently. Solr truly is open source – anyone can help and contribute. With Elasticsearch, while people can still offer their contributions, only Elastic’s employees (the company behind Elasticsearch and the Elastic Stack) can accept those contributions. 

Is this good or bad? It depends on how you look at it. This means that if there is a feature you need and you contribute it to the community, with adequate quality, it can be accepted into Solr. With Elasticsearch, it’s up to Elastic to decide whether a contribution would be accepted. So there may be more feature options on Solr. On the other hand, contributions to Elasticsearch, which go through more levels of quality checks, may offer higher consistency and quality.   

 

Documentation

Both Elasticsearch and Solr have very well-documented reference guides. Elasticsearch runs on top of Github and Solr uses Atlassian Confluence. You can find them via the links below.

 

Core technology

Let’s get a little bit more technical. Elasticsearch and Solr are two different search engines. But underneath, they both use Lucene, which means both are built on “the shoulders of giants.”

For those of you who wonder why I consider Lucene a “giant,” it is the actual information retrieval software library under the hood of many search engines. It is extremely fast, stable, and probably can’t get better than this. Lucene was created in 1999 by Doug Cutting – one of the creators of Hadoop. So there you go, Lucene is the perfect choice for using at the heart of a search engine.

 

Java APIs and REST

Elasticsearch has a more “Web 2.0” REST API, but Solr does have a much better Java API with SolrJ – or SolrNet if you use Microsoft technologies. Elasticsearch has Nest and Elasticsearch.Net. Solr’s REST API may feel less flexible, but it works wonderfully for what you need: indexing and querying. Elasticsearch speaks JSON, so if you use JSON all around, then it is a good choice. Solr supports JSON as well, but it was added at a later stage as originally it was aimed for XML.

 

Content processing

content processingBecause they both expose an API, it is simple to index content from your custom application or already existing and configurable applications. For example, our Aspire content processing framework is able to connect to multiple data sources and post to either Elasticsearch or Solr. 

Solr also has a feature for extracting text from binary files using Apache Tika. So you can upload a PDF via the ExtractRequestHandler and Solr will know what to do with it. 

On the other hand, Elasticsearch works nicely with Logstash, which can process data from any source and index it.

 

Scalability

Scaling is a key consideration. In this scenario, Elasticsearch was winning the game when Solr was still constrained to Master-Slave. However, SolrCloud has recently come into the game. And with the help of Zookeeper, it is now possible to scale a Solr cluster in a much easier and faster way – an enhancement compared to older versions of Solr with Master-Slave. It will still need a lot of improvements, but the future looks bright in terms of the size of datasets that can be ingested and searched in Solr.

 

Vendor support

There are several companies that got to a point where they had to decide which product worked best for them. For example, Cloudera selected Solr as their search engine to integrate into the open source CDH (Cloudera Distribution Including Hadoop). On the other hand, there are other vendors who have selected Elasticsearch as the search engine for their solutions. We at Search Technologies help with the consulting, deployment, and support of both search engines. 

 

Vision & ecosystem

Solr has been more oriented towards text search. Elasticsearch quickly carved out its niche, aiming for log analytics by creating the Elastic Stack (formerly known as the ELK Stack), which stands for Elasticsearch, Logstash,  Kibana, and Beats. Both have a clear vision and they are making great strides in their directions.

One thing worth reiterating is how both search engines are being used as the foundation of many leading search and big data platforms. For example, Elasticsearch is part of Microsoft’s Azure Search while Solr has been integrated into Cloudera Search.

 

Performance

When it comes to performance, based on the experiences I have heard from many developers, we can say that both engines are solid performers. Thus, for the majority of use cases, whether it is an internal or external search application, performance won’t be much of an issue if the developer designs and configures them properly.

 

Web administration

Solr comes with web administration bundled in, while Elasticsearch has multiple other premium plugins for security, alerting, and monitoring. This list showcases Elastic’s entire product family.

 

Visualization

There are many ways to visualize the data in Elasticsearch and Solr – you can build your custom visualization dashboard or use the search engine’s standard visualization features, perhaps with some tweaks. But there is one difference worth mentioning.  

Solr has focused primarily on text search. It does a great job at this, becoming what seems to be the standard for search applications. But Elasticsearch has moved in a different direction where it goes beyond search to tackle log analytics and visualization with the Elastic Stack. Below are some visualizations you can do with Kibana 5. 

 

kibana 5 dashboard

 

This does not mean one is better than another. It just indicates that each search engine has its own strengths in different use cases and needs, and your selection will greatly depend on what your organization wants to accomplish.

 

So long story short, both Elasticsearch and Solr are excellent open source choices that will help you get more out of your data. It all depends on your requirements, your budget, your timing, and the complexity of your project.

Helpful Resources

  • This e-book details the key criteria for choosing a search engine. It can help guide you through your decision-making process.
  • If you are looking for expert help to evaluate search engine and implementation options, contact us to learn more about our assessment. 

– Xavier

Source

Open Source Intelligence (OSINT) Tools & Resources

keyword research tools

Search Engines

General Search

National Search Engines

Privacy-oriented search engines

  • DuckDuckGo: Online investigators usually use it to search the surface web while using the Tor Browser.
  • Startpage: Fetch results from Google without tracking its users.
  • Peekier: Privacy oriented search engines that fetch its results using its own search algorithm.
  • Qwant: Based in France.
  • Oscobo: Based in UK.
  • Swisscows:  Privacy safe WEB-search based in Switzerland.
  • Gigablast: Open source search engine.
  • Gibiru:  Uncensored and anonymous search engine.

Meta search engines

  • Excite
  • Search
  • MetaGer
  • Zapmeta
  • etools: Compile its results from major international search engines, keep user privacy by not collecting or sharing personal information of its users. This search engine is very fast and show a summary for each search query -on the right side- detailing the source of its results.
  • All the interne: query major search engines including shopping site like Amazon and eBay.
  • izito: Aggregate data from multiple sources (Yahoo, Bing, Wikipedia, YouTube and other) to generate optimal results which includes images, videos, news and articles.
  • Metacrawler: Aggregate results from Google and Yahoo!.
  • My all search: Aggregates results through Bing, DuckDuckGo, AOL Search, Ask, Oscobo, Mojeek, ZapMeta, MetaCrawler.
  • Carrot2: Open Source Search Results Clustering Engine aggregate results from GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, and more.
  • elocalfinder: Fetch results from Google, Yahoo!, Ask and Bing .
  • All-in-One
  • Searx

FTP search engines

Files Search Engines

Image Search Engines

Images shared across social media sites can be found in the following locations:

There are specialized sites that hold images appeared in the press and news media, to search for this type of images go to:

Reverse image search

Video Search Engines

Blog Search

Custom Search Engines

Internet Of Things (IoT) devices search engines

Exploits search engines

Dark Web Search Engines

You should download the TOR Browser first before you can access sites hosted on the TOR network.

News/Newspaper Search Engines

Fake News Detection

  • Snopes: Discovers false news, stories, urban legends and research/validate rumors to see whether it is true.
  • Hoaxy: Check the spread of false claims (like a hoax, rumor, satire, news report) across social media sites. The site derives its results from reputable fact checking organizations to return the most accurate results.
  • FactCheck: This site is partnered with Facebook to help identify and label fake news reported by its users. It also monitors different media for the false info covering a wide range of topics like health, science, hoaxes spread through Spam emails.
  • ReviewMeta: Analysis Amazon user reviews.
  • Reporter Lab: Gives a map of global fact-checking sites.
  • Truth Or Fiction: Discover fake news in different topics like politics, nature, health, space, crime, police and terrorism…etc.
  • Hoax-Slayer: Focuses on email scam and social media hoaxes.
  • Verification Handbook: A definitive guide to verifying digital content for emergency coverage available in different languages.
  • Verification junkie: This is a directory of tools for verifying, fact checking and assessing the validity of eyewitness reports and user self-published content online.
  • citizen evidence: tools and lessons to teach people how to authenticate user-generated online contents. Managed by Amnesty International.

Specialized Search Engines

Niche Search Engines

Patent Search Engines

Web Directories

Translation services

Business Search

Business Annual Records

Business Profiles

Grey literature

Grey information includes the following -and more (Academic papers, preprints, proceedings, conference & discussion papers, research reports, marketing reports, technical specifications and standards, dissertations, theses, trade publications, memoranda, government reports and documents not published commercially, translations, newsletters, market survey, draft version of books, articles.

Most important Grey literature (academic and scholarly resources) websites can be found in the following list:

Data Leak Websites

Pastebin sites

  • Pastebin
  • PasteLert: A Pastebin alerting service dedicated for Pastebin.com website.
  • Dump Monitor: This is a Twitter account that monitors multiple paste sites for password dumps and other sensitive information.

Source

What is an open-source search engine?


Firstly, I doubt whether you would have used an open source search engine unless you use Linux or are in the research field.

None of the large web search engines are open source. To repeat what John Linn said,

Open source simply means that the source code (programming) is available to anyone to use and modify as they desire.

Search engines like Lucene[1], Nutch[2], Terrier[3], Xapian[4] and others are examples of Open Source search engines. They all allow you to change the code of the retrieval and ranking process.

[1] lucene.apache.org/core/
[2] http://nutch.apache.org/
[3] http://te…Loading…

Source

Choosing an Open Source search engine: Solr or Elasticsearch?

There has never been a better time to be a search and open source enthusiast than 2017. Far behind are the old days of information retrieval being a field only available for academia and experts.

Now we have plenty of search engines that allow us not only to search, but also navigate and discover our information. We are going to be focusing on two of the leading search engines that happed to be open source projects: Elasticsearch and Solr.

Organisations and Support

Solr is an Apache sub-project developed parallelly along Lucene. Thanks to this it has synchronized releases and benefits directly from any new Lucene feature.

Lucidworks (previously Lucid Imagination) is the main company supporting Solr. They provide development resources for Solr, commercial support, consulting services, technical training and commercial software around Solr. Lucidworks is based in San Francisco and offer their services in the USA and all rest of the world through strategic partners. Lucidworks have historically employed around a third of the most active Solr and Lucene committers, contributing most of the Solr code base and organizing the Lucene/Revolution conference every year.

Elasticsearch is an open-source product driven by the company Elastic (formerly known as Elasticsearch). This approach creates a good balance between the open-source community contributing to the product and the company making long term plans for future functionality as well as ensuring transparency and quality.

Elastic comprehends not only Elasticsearch but a set of open-source products called the Elastic stack: Elassticsearch, Kibana, Logstash, and Beats. The company offers support over the whole Elastic stack and a set of commercial products called X-Pack, all included, in different tiers of subscriptions. They offer trainings every second week around the world and organize the ElasticON user conferences.

Ecosystem

Solr is an Apache project and by being so it benefits from a large variety of apache projects that can be used along with it. The first and foremost example is its Lucene core (http://lucene.apache.org/core/) that is released on the same schedule and from which it receives all its main functionalities. The other main project is Zookeper that handles SolrCloud clusters configuration and distribution.

On the information gathering side there is Apache Nutch, a web crawler, and Flume , a distributed log collector.

When it comes to process information, there are no end to Apache projects, the most commonly used alongside Solr are Mahout for machine learning, Tika for document text and metadata extraction and Spark for data processing.

The big advantage lies in the big data management and storage, with the highly popular Hadoop  library as well as Hive, HBase, and Cassandra databases. Solr has support to store the index in a Hadoop Highly Distributed File System for high resilience.

Elasticsearch is owned by the Elastic company that drives and develops all the products on its ecosystem, which makes it very easy to use together.

The main open-source products of the Elastic stack along Elasticsearch are Beats, Logstash and Kibana. Beats is a modular platform to build different lightweight data collectors. Logstash is a data processing pipeline. Kibana is a visualization platform where you can build your own data visualization, but already has many build-in tools to create dashboards over your Elasticsearch data.

Elastic also develop a set of products that are available under subscription: X-Pack. Right now, X-Pack includes five producs: Security, Alerting, Monitoring, Reporting, and Graph. They all deliver a layer of functionality over the Elastic Stack that is described by its name. Most of them are included as a part of Elasticsearch and Kibana.

Strengths

Solr

  • Many interfaces, many clients, many languages.
  • A query is as simple as solr/select?q=query.
  • Easy to preconfigure.
  • Base product will always be complete in functionality, commercial is an addon.

Elasticsearch

  • Everything can be done with a JSON HTTP request.
  • Optimized for time-based information.
  • Tightly coupled ecosystem.

Base product will contain the base and is expandable, commercial are additional features.

solr vs elasticsearch comparison open source search engine

If you are already using one of them and do not explicitly need a feature exclusive of the other, there is no big incentive in making a migration.

In any case, as the common answer when it comes to hardware sizing recommendations for any of them: “It depends.” It depends on the amount of data, the expected growth, the type of data, the available software ecosystem around each, and mostly the features that your requirements and ambitions demand; just to name a few.

 

At Findwise we can help you make a Platform evaluation study to find the perfect match for your organization and your information.

 

Written by: Daniel Gómez Villanueva – Findability and Search Expert

Source

Your own search engine for documents, images, tables, files, intranet & news


Integrated research tools for easier searching, monitoring, analytics, discovery & text mining of heterogenous & large document sets & news with free software on your own server

Search engine
(Fulltext search)

Easy full text search in multiple data sources and many different file formats: Just enter a search query (which can include powerful search operators) and navigate through the results.

Interactive filters
(Faceted search)

Easy navigation through many results with interactive filters (faceted search) which aggregates an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types.

Exploration, browsing & preview
(Exploratory search)

Explore your data or search results with an overview of aggregated search results by different facets with named entities (i.e. file paths, tags, persons, locations, organisations or products), while browsing with comfortable navigation through search results or document sets.
View previews (i.e. PDF, extracted Text, Table rows or Images).
Analyze or review document sets by preview, extracted text or wordlists for textmining.

Collaborative annotation & tagging (Social search & collaborative filtering)

Tag your documents with keywords, categories, names or text notes that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search).

Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering).

Monitoring: Alerts & Watchlists (Newsfeeds)

Stay informed via watchlists for news alerts from media monitoring or activity streams of new or changed documents on file shares: Subscribe searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter.

Supports different file formats

No matter if structured data like databases, tables or spreadsheets or unstructured data like text documents, E-Mails or even scanned legacy documents: Search in many different formats and content types (text files, Word and other Microsoft Office documents or OpenOffice documents, Excel or LibreOffice Calc tables, PDF, E-Mail, CSV, doc, images, photos, pictures, JPG, TIFF, videos and many other file formats).

Supports multiple data sources

Find all your data at one place: Search in many different data sources like files and folders, file server, file shares, databases, websites, Content Management Systems, RSS-Feeds and many more.

The Connectors and Importers of the Extract Transform Load (ETL) framework for Data Integration connects and combines multiple data sources and as integrated document analysis and data enrichment framework it enhances the data with the analysis results of diverse analytics tools.

Open-Source enterprise search and information retrieval technology based on interoperable open standards

Open Semantic Search can not only be used with every desktop (Linux, Windows or Mac) or web browser. With its responsive design and open standards like HTML5 it is possible to search with tablets, smartphones and other mobiles.

Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. So you integrate powerful and flexible metadata management or annotation tools using interoperable open standards like Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS).

Using file monitoring, new or changed files are indexed within seconds without frequent recrawls (which is not possible often if many files).
Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system.

Empowering users and independent organizations

Free software based on Apache Solr or Elasticsearch Open Source Enterprise Search, Django & Python. So you’re allowed to read and to change the source code.

Security & privacy for sensitive documents: Running on your own computer, laptop or server, the search engine won’t send wether indexed documents or data nor search queries to spying cloud services.

Working with open standards for Semantic Web and Linked Data like HTTP, HTML, CSS, RSS, RDF, SKOS, Dublin Core and a REST-API makes the open search platform flexible, extendable and interoperable with standard software and open for own developments.

So you can easily connect to, enrich with and integrate data from other software, applications or web services using interoperable and open web standards and use Open Data and Linked Open Data like the open encyclopedia Wikipedia, the open database Wikidata or the open dictionary Wiktionary for enhancement of search and analytics.

The software architecture is platform independent (Java & Python) and modular (keep it simple) and interoperable.

So mostly modules are just integration with powerful standard tools and free software components or apps based on the open-source search engine Apache Solr or Elasticsearch, powerful Linux tools and standard web frameworks (Drupal, Semantic Mediawiki, Django), Apache Tika for content extraction, Hypothesis collaborative web annotation tools, Spacy natural language processing and machine leaning framework for Named Entity Recognition, Neo4j graph database for visual graph analysis and an open source web crawling framework.

Like we use standard open source software components, you can use and integrate our search engine components and research tools with your existing standard software environment and customize many options of powerful standard tools and frameworks.

If you need you can scale up to Big Data and very large document sets with vast amounts of documents:

The main open source components like Apache Lucene, Apache Solr and Elastic Search are scalable to a search cluster for very large amounts of data, much load and for high availability.

But even cheep old standard hardware is enough for a search server for gigabytes or terabytes or millions of documents.

The search engine works even offline or unhosted on a single laptop without need of a intranet or internet connection or a server.

How to getting started

Learn how to setup your own search engine in just a few steps

Getting started

Subscribe

Subscribe to our Newsfeed

RSS-NewsfeedFacebookTwitter



Source

Sphinx | Open Source Search Engine


Suddenly, here goes an overdue status update. Short version:Sphinx still goes on in 2017. (And I myself, ie. Andrew, ie. that weird guy who created Sphinx, am still quite alive, in case anyone’s curious.) We have downsized, we have been through a rather rough patch, and Sphinx is currently in a semi-stealth mode, again. However, the work still goes on, mostly focused on a 3.0 uber-update these days. Read on for details.

Read more…

Source

Open Source Search Engine and Search API


OpenSearchServer | Open Source Search Engine and Search API

OpenSearchServer v2NEW

The Alpha release is now available !

Discover

OpenSearchServe 2.0 helps your building state of the art search experience.
You manage the index, the records and the web templates.
We take care of hosting your search service.

GitHub integration Full text Search Faceted Search Snippets & highlighting Fully scalable Free offer

OpenSearchServer v1

The open-source enterprise class
search engine software

    Test online

  • A full set of search functions
  • Build your own indexing strategy
  • A fully integrated solution
  • Parsers extract full-text data
  • The crawlers can index everything
  • 17 language options
  • Special analysis for each language
  • Numerous filters: n-gram, lemmatization, shingle, elisions, stripping diacritic, Etc.
  • Automatic language detection
  • Named entity recognition
  • Synonyms (word and multi-terms)
  • Automatic classifications
  • REST API and SOAP Web Service
  • Monitoring module
  • Index replication
  • Scheduling for periodic tasks
  • Scripting feature powered by Selenium®
  • Multiple client implementations: PHP, Ruby, Perl, C#, Etc.
  • Office® documents (Word®, Excel®, Powerpoint®, Visio®, Publisher®)
  • OpenOffice® documents
  • Adobe PDF® (with OCR)
  • Web pages (HTML), RTF, plain text
  • Audio files metadata
  • Images (metadata and OCR)
  • MAPI® messages
  • Etc.
  • The web crawler includes inclusion or exclusion filters with wildcards, HTTP authentication, screenshot, sitemap, Etc.
  • The REST Crawler indexes Web Services data
  • The file system crawler browses SMB/CIFS, FTP(S), SWIFT
  • The database crawler supports all databases (JDBC)

  Downloads & documentation v1

Fork me on GitHub

v1.5.14

binary packages

Nightly builds : North AmericaEurope

Documentation

Read, comment, contribute.

Forges

Source code, issues tracker, forums.

github logo
Get OpenSearchServer at SourceForge.net. Fast, secure and Free Open Source software downloads

  Hosting services v1

Immediate access to an OpenSearchServer instance hosted on our Cloud infrastructure.

Plans
DISCOVER
ribbon

STARTER
PREMIUM
PRO
ENTERPRISE
Number of documents 50,000 250,000 1,000,000 10,000,000 Unlimited
Index number 10 100 Unlimited Unlimited Unlimited
Storage 5 GB 10 GB 20 GB 40 GB On demand
Transfer capacity (4) 500 GB 1 TB 2 TB 3 TB On demand
Price per month (5)
$19
$49
$99
$199
Contact us

Start your 14-day free trial (6)

  • (4)Transfer: capacity on a monthly basis. Unlimited number of operations.
  • (5)No commitment: pay on a monthly basis. Stop at anytime.
  • (6)14-day free trial: no credit card required, no commitment.

  Support services v1

Full access to the developers who built OpenSearchServer. Our team is dedicated to making your project successful.

Plans  
DEVELOPER
SILVER
ribbon

GOLD
PLATINUM
Unlimited support
E-mail and Web tickets
 
Time slot   Business hours Business hours 24×7 24×7
Response time
on production issues (1)
  1
business day
8
hours
4
hours
Response time
on development issues (2)
  1
business day
2
business day
1
business day
1
business day
Number of contacts  
Price per month (3)  
$59
$79
$139
$399

Get started now

  • (1)Production support includes maintenance releases, bug fixes, patches, updates. Your team will access the best practice in building strong and robust environments for any production issue (installation, performances).
  • (2)Development support helps your developers integrate OpenSearchServer: code review, architecture, schema design, query construction.
  • (3)No commitment: pay on a monthly basis. Stop at anytime.

Source

18 Advanced Alternative Search Engines of 2020

Google tends to be a giant gorilla in the room during all SEO discussion. The reason behind this is its dominating market share – according to netmarketshare, Google holds more than 90% of mobile and tablet, and around 80% of desktop global search engine market share.

However, it isn’t the only option. There are literally tons of search engines on the web. Some of them focuses on tech news or research paper, while some provide a single line answer instead of listing millions of pages.

We would like to present you some of the most advanced alternatives to Google that will help you find what Google might not. We are not saying they are better than Google, but some of them are good at performing specific searches. Because our aim is to uncover the things you might not aware of, we haven’t included some big players like Bing, Baidu and Yahoo search.

18. StartPage

startpage

StartPage was the first search engine to allow users to search privately. None of your details are recorded and no cookies are used, unless you allow it to remember your preferences. It also provides a proxy for those who want to not just search, but browse the internet with full privacy.

In 2014, the company released a privacy protecting email service, called StartMail. As of 2015, the search engine reached its record daily direct queries of 5.7 million (28-day average).

17. BoardReader

BoardReader is a very useful resource for any type of community research, as it searches forums and message boards. Users can either look for content on the forums or for forums related to the specific topic.

The front-end look quite simple, exactly what forum search engine should look like, but on the back-end they run a robust data business by selling off user’s data to advertising companies.

16. Yippy

Founded in 2009, Yippy is a metasearch engine that offers a cluster of results. It’s search technology is used in IBM Watson Explorer (a cognitive exploration and content analysis platform).

With Yippy, you can search different types of content, including news, images, blogs, government data, etc., and filter the results category wise or flag any inappropriate content. Like Google, it lets you view cached webpages and filter results by sources or tag clouds. Also, there is a preview link on each result that shows how content looks like, on the same page.

15. FindSounds

FindSounds is the perfect search engine for finding sound effects for personal or commercial use. Just filter the results before you begin, using the suitable checkboxes. You can search anything by category, from animal to vehicle sound effects, and the search engine will return you detailed results, along with file format, length and bit-rate information.

Overall, searching sound effects using google is always an option, but FindSounds is perfect sound engine to speed up your search and get the specific element you are looking for.

14. SearchCode

SearchCode is a free source code and documentation search engine that finds code snippets from open source repositories. It has indexed more than 20 billion lines of code, from projects on Google code, Github, Sourceforge, GitLab, Bitbucket, Codeplex and more.

Most web crawlers face difficulties while searching special characters used in the code. SearchCode overcomes this issue and lets you search for code by method name, variable name, operations, usage, security flaws and by special characters much faster than other code search engines.

13. GigaBlast

GigaBlast is an open source search engine, written in C and C++ programming language. As of 2015, they had indexed more than 12 billion webpages and received billions queries per month. It provides search results to other companies like Zuula, Blingo, Clusty and Snap.

GigaBlast allows you to search with certain customizations and optional parameters, for instance, searching by exact phrase, terms, filetypes, languages and much more.

12. KidRex and Kiddle

KidRex and Kiddle are both child-safe search engine that keeps out age-inappropriate content unfit for consumption for children. Although they are powered by Google Custom Search (utilize Google SafeSearch), they maintain their own database of inappropriate keywords and websites.

The interface of KidRex features hand-drawn crayon and colored marker design, whereas, Kiddle is written in the characteristic colorful Google Style, with a red droid alien on the top waiting to answer your queries.

Also, you will find search results are slightly modified. For instance, if you search Narendra Modi, the search engine would return webpages from sites like famousbirthdays.com, britannica.com, instead of Wikipedia and news websites. The aim is to provide the simple and easy-to-read content that kids could understand without putting a lot of effort.

11. MetaGer

MetaGer is German-based metasearch engine, developed on 24 small scale web crawlers. It focuses on user’s privacy and makes searches untraceable by leaving no footprint behind. Also, it integrates a proxy server so that users can open any link anonymously from the search results while keeping their IP address hidden from the destination server. This eliminates the chances of advertisers to target you for ads.

The results are obtained from 50 different search engines. Before presenting final results of the query, they are filtered, compiled an sorted.

10. Libraries.io

This is an open source search engine for finding software development project, including new frameworks, libraries and tools. It monitors more than 2.5 million open source libraries across 34 different package managers.

In order to collect the library information, the website uses dominant package manager for each supported programming language. Then, it organizes them by package manager, programming language, license (MIT or GPL), and by keyword.

9. Creative Commons Search

This search engine is extremely useful for bloggers and authors who need content that could be reused in a blog post or commercial applications. It allows users to search for images and contents that are released under the creative commons license.

The website provides social features, allowing users to build and share lists, as well as add tags to the objects in the commons and save their searches. It also offers some useful filters such as, find images that can be used for commercial purpose, or images that can be modified and reused, or search within tags, title and creator.

8. IxQuick

IxQuick is the metasearch engine that provides the top 10 results from different search engines. In order to rank the results, it uses a ‘star system’ that awards one star to each result that has been returned from a search engine. Therefore, results returned from the most search engines would be at the top.

IxQuick doesn’t store your private details – no history, no query is collected. However, it uses only one cookie, known as ‘preference’, to remember your search preferences for future searches, which automatically gets deleted if you don’t use visit IxQuick for 90 days. Moreover, with around 5.7 million searches per day, the network is growing very fast, and currently supports 17 languages.

7. Dogpile

Yet another metasearch engine that gets results from multiple search engines (including Google, Bing and Yahoo) and directories and then presents them combined to the user. There is an advanced search option that lets you narrow down searches by exact phrase, date, language, and adult content. Also, you can set your own preference and customize default search settings.

In addition to that, Dogpile recommends related content based on the original search term, keeps track of the 15 most recent searches, and shows recent popular searches from the other users.

6. Internet Archive

It’s a nonprofit digital library that aims to provide universal access to all knowledge. Internet Archive consists of websites, music, images, videos, software applications and games, and around 3 million books that fall under public domain.

As of 2016, Internet archive had 15 petabytes of data, advocating for a free and open Internet. Its web archive, known as Wayback Machine, allows users to search for iterations of a website in the past. It contains more than 308 billion web captures, making it one of the world’s largest digitization projects.

5. Yandex

Yandex is the largest search engine in Russia with nearly 65% Russian market share. According the Comscore, it is the fourth largest search engine in the world with over 150 million searches per day as of 2012.

Yandex features a parallel search that shows results from main web index as well as specialized information resources, including blogs, news, image and video webpages, and eCommerce sites. In addition, the search engine provides supplementary information (like sports results), and contains spell checkers, autocomplete functionality and antivirus that detects malicious content on webpages.

4. WolframAlpha

WolframAlpha is a computational knowledge engine that answers factual questions from externally sourced curated data. It does not provide a list of webpages or documents that might contain the specific answer you are looking for. Instead, you get a one-word or one-line, and to-the-point answer.

It is written in Wolfram programming language (contains over 15 million lines of code) and runs on more than 10,000 CPUs. It is based on a computational platform known as Wolfram Mathematica that encompasses numerical computation, computer algebra, statistics and visualization capabilities.

3. Ask.com

Launched in 1996, Ask.com is a question answering-focused web search engine. Despite its age, Ask is still very active. They have coupled their search-system with a robust questions and answer system with billions of online content.

As of 2014, the website had 180 million global users per month (with a larger user base in the US), and to date, its mobile app has been downloaded over 40 million times. They acquired a social networking site, Ask.fm, where people can ask questions with the option of anonymity. ASKfm handles around 20,000 questions every minute.

Read: 30 Cool Alternative Web Browsers You Didn’t Know of

2. Ecosia

Ecosia donates 80% of its profit to plant trees and supports full financial transparency. As of October 2017, the website has reached the milestone of 15 million trees planted. In 2015, the company was shortlisted for the European Tech Startups Awards under the ‘Best European Startup Aimed at Improving Society’ category.

The search result(s) of Ecosia is powered by Bing and Ecosia’s own search algorithms. The company claims that it takes 45 searches to fund the planting of single tree, and they assure that algorithms can easily detect fake clicks and invalidate them. Currently, it’s the default search engine of Vivaldi, Waterfox and Polarity web browser.

1. DuckDuckGo

DuckDuckGo is the best alternative option available out there. The search engine doesn’t collect any of your personal information or store your history. They don’t follow around you with ads because they have nothing to sell to advertisers.

Read: 15 Mobile App Search Engines | for both Android and iOS

DuckDuckGo doesn’t provide personalized results – all users will see the same results for a given search query. Rather than returning thousands of results, it emphasizes on returning the best results, and extracts those results from more than 400 sources. It’s a smart search engine (uses semantic search technique like Google) that depends on a highly evolved contextual library for intuiting the user’s intent.

report this ad

Source

What is an open-source search engine?


Firstly, I doubt whether you would have used an open source search engine unless you use Linux or are in the research field.

None of the large web search engines are open source. To repeat what John Linn said,

Open source simply means that the source code (programming) is available to anyone to use and modify as they desire.

Search engines like Lucene[1], Nutch[2], Terrier[3], Xapian[4] and others are examples of Open Source search engines. They all allow you to change the code of the retrieval and ranking process.

[1] lucene.apache.org/core/
[2] http://nutch.apache.org/
[3] http://te…Loading…

Source

Choosing an Open Source search engine: Solr or Elasticsearch?

There has never been a better time to be a search and open source enthusiast than 2017. Far behind are the old days of information retrieval being a field only available for academia and experts.

Now we have plenty of search engines that allow us not only to search, but also navigate and discover our information. We are going to be focusing on two of the leading search engines that happed to be open source projects: Elasticsearch and Solr.

Organisations and Support

Solr is an Apache sub-project developed parallelly along Lucene. Thanks to this it has synchronized releases and benefits directly from any new Lucene feature.

Lucidworks (previously Lucid Imagination) is the main company supporting Solr. They provide development resources for Solr, commercial support, consulting services, technical training and commercial software around Solr. Lucidworks is based in San Francisco and offer their services in the USA and all rest of the world through strategic partners. Lucidworks have historically employed around a third of the most active Solr and Lucene committers, contributing most of the Solr code base and organizing the Lucene/Revolution conference every year.

Elasticsearch is an open-source product driven by the company Elastic (formerly known as Elasticsearch). This approach creates a good balance between the open-source community contributing to the product and the company making long term plans for future functionality as well as ensuring transparency and quality.

Elastic comprehends not only Elasticsearch but a set of open-source products called the Elastic stack: Elassticsearch, Kibana, Logstash, and Beats. The company offers support over the whole Elastic stack and a set of commercial products called X-Pack, all included, in different tiers of subscriptions. They offer trainings every second week around the world and organize the ElasticON user conferences.

Ecosystem

Solr is an Apache project and by being so it benefits from a large variety of apache projects that can be used along with it. The first and foremost example is its Lucene core (http://lucene.apache.org/core/) that is released on the same schedule and from which it receives all its main functionalities. The other main project is Zookeper that handles SolrCloud clusters configuration and distribution.

On the information gathering side there is Apache Nutch, a web crawler, and Flume , a distributed log collector.

When it comes to process information, there are no end to Apache projects, the most commonly used alongside Solr are Mahout for machine learning, Tika for document text and metadata extraction and Spark for data processing.

The big advantage lies in the big data management and storage, with the highly popular Hadoop  library as well as Hive, HBase, and Cassandra databases. Solr has support to store the index in a Hadoop Highly Distributed File System for high resilience.

Elasticsearch is owned by the Elastic company that drives and develops all the products on its ecosystem, which makes it very easy to use together.

The main open-source products of the Elastic stack along Elasticsearch are Beats, Logstash and Kibana. Beats is a modular platform to build different lightweight data collectors. Logstash is a data processing pipeline. Kibana is a visualization platform where you can build your own data visualization, but already has many build-in tools to create dashboards over your Elasticsearch data.

Elastic also develop a set of products that are available under subscription: X-Pack. Right now, X-Pack includes five producs: Security, Alerting, Monitoring, Reporting, and Graph. They all deliver a layer of functionality over the Elastic Stack that is described by its name. Most of them are included as a part of Elasticsearch and Kibana.

Strengths

Solr

  • Many interfaces, many clients, many languages.
  • A query is as simple as solr/select?q=query.
  • Easy to preconfigure.
  • Base product will always be complete in functionality, commercial is an addon.

Elasticsearch

  • Everything can be done with a JSON HTTP request.
  • Optimized for time-based information.
  • Tightly coupled ecosystem.

Base product will contain the base and is expandable, commercial are additional features.

solr vs elasticsearch comparison open source search engine

If you are already using one of them and do not explicitly need a feature exclusive of the other, there is no big incentive in making a migration.

In any case, as the common answer when it comes to hardware sizing recommendations for any of them: “It depends.” It depends on the amount of data, the expected growth, the type of data, the available software ecosystem around each, and mostly the features that your requirements and ambitions demand; just to name a few.

 

At Findwise we can help you make a Platform evaluation study to find the perfect match for your organization and your information.

 

Written by: Daniel Gómez Villanueva – Findability and Search Expert

Source