Choosing an Open Source search engine: Solr or Elasticsearch?

There has never been a better time to be a search and open source enthusiast than 2017. Far behind are the old days of information retrieval being a field only available for academia and experts.

Now we have plenty of search engines that allow us not only to search, but also navigate and discover our information. We are going to be focusing on two of the leading search engines that happed to be open source projects: Elasticsearch and Solr.

Organisations and Support

Solr is an Apache sub-project developed parallelly along Lucene. Thanks to this it has synchronized releases and benefits directly from any new Lucene feature.

Lucidworks (previously Lucid Imagination) is the main company supporting Solr. They provide development resources for Solr, commercial support, consulting services, technical training and commercial software around Solr. Lucidworks is based in San Francisco and offer their services in the USA and all rest of the world through strategic partners. Lucidworks have historically employed around a third of the most active Solr and Lucene committers, contributing most of the Solr code base and organizing the Lucene/Revolution conference every year.

Elasticsearch is an open-source product driven by the company Elastic (formerly known as Elasticsearch). This approach creates a good balance between the open-source community contributing to the product and the company making long term plans for future functionality as well as ensuring transparency and quality.

Elastic comprehends not only Elasticsearch but a set of open-source products called the Elastic stack: Elassticsearch, Kibana, Logstash, and Beats. The company offers support over the whole Elastic stack and a set of commercial products called X-Pack, all included, in different tiers of subscriptions. They offer trainings every second week around the world and organize the ElasticON user conferences.

Ecosystem

Solr is an Apache project and by being so it benefits from a large variety of apache projects that can be used along with it. The first and foremost example is its Lucene core (http://lucene.apache.org/core/) that is released on the same schedule and from which it receives all its main functionalities. The other main project is Zookeper that handles SolrCloud clusters configuration and distribution.

On the information gathering side there is Apache Nutch, a web crawler, and Flume , a distributed log collector.

When it comes to process information, there are no end to Apache projects, the most commonly used alongside Solr are Mahout for machine learning, Tika for document text and metadata extraction and Spark for data processing.

The big advantage lies in the big data management and storage, with the highly popular Hadoop  library as well as Hive, HBase, and Cassandra databases. Solr has support to store the index in a Hadoop Highly Distributed File System for high resilience.

Elasticsearch is owned by the Elastic company that drives and develops all the products on its ecosystem, which makes it very easy to use together.

The main open-source products of the Elastic stack along Elasticsearch are Beats, Logstash and Kibana. Beats is a modular platform to build different lightweight data collectors. Logstash is a data processing pipeline. Kibana is a visualization platform where you can build your own data visualization, but already has many build-in tools to create dashboards over your Elasticsearch data.

Elastic also develop a set of products that are available under subscription: X-Pack. Right now, X-Pack includes five producs: Security, Alerting, Monitoring, Reporting, and Graph. They all deliver a layer of functionality over the Elastic Stack that is described by its name. Most of them are included as a part of Elasticsearch and Kibana.

Strengths

Solr

  • Many interfaces, many clients, many languages.
  • A query is as simple as solr/select?q=query.
  • Easy to preconfigure.
  • Base product will always be complete in functionality, commercial is an addon.

Elasticsearch

  • Everything can be done with a JSON HTTP request.
  • Optimized for time-based information.
  • Tightly coupled ecosystem.

Base product will contain the base and is expandable, commercial are additional features.

solr vs elasticsearch comparison open source search engine

If you are already using one of them and do not explicitly need a feature exclusive of the other, there is no big incentive in making a migration.

In any case, as the common answer when it comes to hardware sizing recommendations for any of them: “It depends.” It depends on the amount of data, the expected growth, the type of data, the available software ecosystem around each, and mostly the features that your requirements and ambitions demand; just to name a few.

 

At Findwise we can help you make a Platform evaluation study to find the perfect match for your organization and your information.

 

Written by: Daniel Gómez Villanueva – Findability and Search Expert

Source

Your own search engine for documents, images, tables, files, intranet & news


Integrated research tools for easier searching, monitoring, analytics, discovery & text mining of heterogenous & large document sets & news with free software on your own server

Search engine
(Fulltext search)

Easy full text search in multiple data sources and many different file formats: Just enter a search query (which can include powerful search operators) and navigate through the results.

Interactive filters
(Faceted search)

Easy navigation through many results with interactive filters (faceted search) which aggregates an overview over and interactive filters for (meta) data like authors, organizations, persons, places, dates, products, tags or document types.

Exploration, browsing & preview
(Exploratory search)

Explore your data or search results with an overview of aggregated search results by different facets with named entities (i.e. file paths, tags, persons, locations, organisations or products), while browsing with comfortable navigation through search results or document sets.
View previews (i.e. PDF, extracted Text, Table rows or Images).
Analyze or review document sets by preview, extracted text or wordlists for textmining.

Collaborative annotation & tagging (Social search & collaborative filtering)

Tag your documents with keywords, categories, names or text notes that are not included in the original content to find them better later (document management & knowledge management) or in other research or search contexts or to be able to filter annotated or tagged documents by interactive filters (faceted search).

Or evaluate, value or assess or filter documents (i.e. for validation or collaborative filtering).

Monitoring: Alerts & Watchlists (Newsfeeds)

Stay informed via watchlists for news alerts from media monitoring or activity streams of new or changed documents on file shares: Subscribe searches and filters as RSS-Newsfeed and get notifications when there are changed or new documents, news or search results for your keywords, search context or filter.

Supports different file formats

No matter if structured data like databases, tables or spreadsheets or unstructured data like text documents, E-Mails or even scanned legacy documents: Search in many different formats and content types (text files, Word and other Microsoft Office documents or OpenOffice documents, Excel or LibreOffice Calc tables, PDF, E-Mail, CSV, doc, images, photos, pictures, JPG, TIFF, videos and many other file formats).

Supports multiple data sources

Find all your data at one place: Search in many different data sources like files and folders, file server, file shares, databases, websites, Content Management Systems, RSS-Feeds and many more.

The Connectors and Importers of the Extract Transform Load (ETL) framework for Data Integration connects and combines multiple data sources and as integrated document analysis and data enrichment framework it enhances the data with the analysis results of diverse analytics tools.

Open-Source enterprise search and information retrieval technology based on interoperable open standards

Open Semantic Search can not only be used with every desktop (Linux, Windows or Mac) or web browser. With its responsive design and open standards like HTML5 it is possible to search with tablets, smartphones and other mobiles.

Structure your research, investigation, navigation, document sets, collections, metadata forms or notes in a Semantic Wiki, Drupal or another content management system (CMS) or with an innovative annotation framework with taxonomies and custom fields for tagging documents, annotations, linking relationships, mapping and structured notes. So you integrate powerful and flexible metadata management or annotation tools using interoperable open standards like Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS).

Using file monitoring, new or changed files are indexed within seconds without frequent recrawls (which is not possible often if many files).
Colleagues are able to find new data immediately without (often forgotten) uploads to a data or document management system (DMS) or filling out a data registration form for each new or changed document or dataset in a data management system, data registry or digital asset management (DAM) system.

Empowering users and independent organizations

Free software based on Apache Solr or Elasticsearch Open Source Enterprise Search, Django & Python. So you’re allowed to read and to change the source code.

Security & privacy for sensitive documents: Running on your own computer, laptop or server, the search engine won’t send wether indexed documents or data nor search queries to spying cloud services.

Working with open standards for Semantic Web and Linked Data like HTTP, HTML, CSS, RSS, RDF, SKOS, Dublin Core and a REST-API makes the open search platform flexible, extendable and interoperable with standard software and open for own developments.

So you can easily connect to, enrich with and integrate data from other software, applications or web services using interoperable and open web standards and use Open Data and Linked Open Data like the open encyclopedia Wikipedia, the open database Wikidata or the open dictionary Wiktionary for enhancement of search and analytics.

The software architecture is platform independent (Java & Python) and modular (keep it simple) and interoperable.

So mostly modules are just integration with powerful standard tools and free software components or apps based on the open-source search engine Apache Solr or Elasticsearch, powerful Linux tools and standard web frameworks (Drupal, Semantic Mediawiki, Django), Apache Tika for content extraction, Hypothesis collaborative web annotation tools, Spacy natural language processing and machine leaning framework for Named Entity Recognition, Neo4j graph database for visual graph analysis and an open source web crawling framework.

Like we use standard open source software components, you can use and integrate our search engine components and research tools with your existing standard software environment and customize many options of powerful standard tools and frameworks.

If you need you can scale up to Big Data and very large document sets with vast amounts of documents:

The main open source components like Apache Lucene, Apache Solr and Elastic Search are scalable to a search cluster for very large amounts of data, much load and for high availability.

But even cheep old standard hardware is enough for a search server for gigabytes or terabytes or millions of documents.

The search engine works even offline or unhosted on a single laptop without need of a intranet or internet connection or a server.

How to getting started

Learn how to setup your own search engine in just a few steps

Getting started

Subscribe

Subscribe to our Newsfeed

RSS-NewsfeedFacebookTwitter



Source

Sphinx | Open Source Search Engine


Suddenly, here goes an overdue status update. Short version:Sphinx still goes on in 2017. (And I myself, ie. Andrew, ie. that weird guy who created Sphinx, am still quite alive, in case anyone’s curious.) We have downsized, we have been through a rather rough patch, and Sphinx is currently in a semi-stealth mode, again. However, the work still goes on, mostly focused on a 3.0 uber-update these days. Read on for details.

Read more…

Source

Open Source Search Engine and Search API


OpenSearchServer | Open Source Search Engine and Search API

OpenSearchServer v2NEW

The Alpha release is now available !

Discover

OpenSearchServe 2.0 helps your building state of the art search experience.
You manage the index, the records and the web templates.
We take care of hosting your search service.

GitHub integration Full text Search Faceted Search Snippets & highlighting Fully scalable Free offer

OpenSearchServer v1

The open-source enterprise class
search engine software

    Test online

  • A full set of search functions
  • Build your own indexing strategy
  • A fully integrated solution
  • Parsers extract full-text data
  • The crawlers can index everything
  • 17 language options
  • Special analysis for each language
  • Numerous filters: n-gram, lemmatization, shingle, elisions, stripping diacritic, Etc.
  • Automatic language detection
  • Named entity recognition
  • Synonyms (word and multi-terms)
  • Automatic classifications
  • REST API and SOAP Web Service
  • Monitoring module
  • Index replication
  • Scheduling for periodic tasks
  • Scripting feature powered by Selenium®
  • Multiple client implementations: PHP, Ruby, Perl, C#, Etc.
  • Office® documents (Word®, Excel®, Powerpoint®, Visio®, Publisher®)
  • OpenOffice® documents
  • Adobe PDF® (with OCR)
  • Web pages (HTML), RTF, plain text
  • Audio files metadata
  • Images (metadata and OCR)
  • MAPI® messages
  • Etc.
  • The web crawler includes inclusion or exclusion filters with wildcards, HTTP authentication, screenshot, sitemap, Etc.
  • The REST Crawler indexes Web Services data
  • The file system crawler browses SMB/CIFS, FTP(S), SWIFT
  • The database crawler supports all databases (JDBC)

  Downloads & documentation v1

Fork me on GitHub

v1.5.14

binary packages

Nightly builds : North AmericaEurope

Documentation

Read, comment, contribute.

Forges

Source code, issues tracker, forums.

github logo
Get OpenSearchServer at SourceForge.net. Fast, secure and Free Open Source software downloads

  Hosting services v1

Immediate access to an OpenSearchServer instance hosted on our Cloud infrastructure.

Plans
DISCOVER
ribbon

STARTER
PREMIUM
PRO
ENTERPRISE
Number of documents 50,000 250,000 1,000,000 10,000,000 Unlimited
Index number 10 100 Unlimited Unlimited Unlimited
Storage 5 GB 10 GB 20 GB 40 GB On demand
Transfer capacity (4) 500 GB 1 TB 2 TB 3 TB On demand
Price per month (5)
$19
$49
$99
$199
Contact us

Start your 14-day free trial (6)

  • (4)Transfer: capacity on a monthly basis. Unlimited number of operations.
  • (5)No commitment: pay on a monthly basis. Stop at anytime.
  • (6)14-day free trial: no credit card required, no commitment.

  Support services v1

Full access to the developers who built OpenSearchServer. Our team is dedicated to making your project successful.

Plans  
DEVELOPER
SILVER
ribbon

GOLD
PLATINUM
Unlimited support
E-mail and Web tickets
 
Time slot   Business hours Business hours 24×7 24×7
Response time
on production issues (1)
  1
business day
8
hours
4
hours
Response time
on development issues (2)
  1
business day
2
business day
1
business day
1
business day
Number of contacts  
Price per month (3)  
$59
$79
$139
$399

Get started now

  • (1)Production support includes maintenance releases, bug fixes, patches, updates. Your team will access the best practice in building strong and robust environments for any production issue (installation, performances).
  • (2)Development support helps your developers integrate OpenSearchServer: code review, architecture, schema design, query construction.
  • (3)No commitment: pay on a monthly basis. Stop at anytime.

Source

18 Advanced Alternative Search Engines of 2020

Google tends to be a giant gorilla in the room during all SEO discussion. The reason behind this is its dominating market share – according to netmarketshare, Google holds more than 90% of mobile and tablet, and around 80% of desktop global search engine market share.

However, it isn’t the only option. There are literally tons of search engines on the web. Some of them focuses on tech news or research paper, while some provide a single line answer instead of listing millions of pages.

We would like to present you some of the most advanced alternatives to Google that will help you find what Google might not. We are not saying they are better than Google, but some of them are good at performing specific searches. Because our aim is to uncover the things you might not aware of, we haven’t included some big players like Bing, Baidu and Yahoo search.

18. StartPage

startpage

StartPage was the first search engine to allow users to search privately. None of your details are recorded and no cookies are used, unless you allow it to remember your preferences. It also provides a proxy for those who want to not just search, but browse the internet with full privacy.

In 2014, the company released a privacy protecting email service, called StartMail. As of 2015, the search engine reached its record daily direct queries of 5.7 million (28-day average).

17. BoardReader

BoardReader is a very useful resource for any type of community research, as it searches forums and message boards. Users can either look for content on the forums or for forums related to the specific topic.

The front-end look quite simple, exactly what forum search engine should look like, but on the back-end they run a robust data business by selling off user’s data to advertising companies.

16. Yippy

Founded in 2009, Yippy is a metasearch engine that offers a cluster of results. It’s search technology is used in IBM Watson Explorer (a cognitive exploration and content analysis platform).

With Yippy, you can search different types of content, including news, images, blogs, government data, etc., and filter the results category wise or flag any inappropriate content. Like Google, it lets you view cached webpages and filter results by sources or tag clouds. Also, there is a preview link on each result that shows how content looks like, on the same page.

15. FindSounds

FindSounds is the perfect search engine for finding sound effects for personal or commercial use. Just filter the results before you begin, using the suitable checkboxes. You can search anything by category, from animal to vehicle sound effects, and the search engine will return you detailed results, along with file format, length and bit-rate information.

Overall, searching sound effects using google is always an option, but FindSounds is perfect sound engine to speed up your search and get the specific element you are looking for.

14. SearchCode

SearchCode is a free source code and documentation search engine that finds code snippets from open source repositories. It has indexed more than 20 billion lines of code, from projects on Google code, Github, Sourceforge, GitLab, Bitbucket, Codeplex and more.

Most web crawlers face difficulties while searching special characters used in the code. SearchCode overcomes this issue and lets you search for code by method name, variable name, operations, usage, security flaws and by special characters much faster than other code search engines.

13. GigaBlast

GigaBlast is an open source search engine, written in C and C++ programming language. As of 2015, they had indexed more than 12 billion webpages and received billions queries per month. It provides search results to other companies like Zuula, Blingo, Clusty and Snap.

GigaBlast allows you to search with certain customizations and optional parameters, for instance, searching by exact phrase, terms, filetypes, languages and much more.

12. KidRex and Kiddle

KidRex and Kiddle are both child-safe search engine that keeps out age-inappropriate content unfit for consumption for children. Although they are powered by Google Custom Search (utilize Google SafeSearch), they maintain their own database of inappropriate keywords and websites.

The interface of KidRex features hand-drawn crayon and colored marker design, whereas, Kiddle is written in the characteristic colorful Google Style, with a red droid alien on the top waiting to answer your queries.

Also, you will find search results are slightly modified. For instance, if you search Narendra Modi, the search engine would return webpages from sites like famousbirthdays.com, britannica.com, instead of Wikipedia and news websites. The aim is to provide the simple and easy-to-read content that kids could understand without putting a lot of effort.

11. MetaGer

MetaGer is German-based metasearch engine, developed on 24 small scale web crawlers. It focuses on user’s privacy and makes searches untraceable by leaving no footprint behind. Also, it integrates a proxy server so that users can open any link anonymously from the search results while keeping their IP address hidden from the destination server. This eliminates the chances of advertisers to target you for ads.

The results are obtained from 50 different search engines. Before presenting final results of the query, they are filtered, compiled an sorted.

10. Libraries.io

This is an open source search engine for finding software development project, including new frameworks, libraries and tools. It monitors more than 2.5 million open source libraries across 34 different package managers.

In order to collect the library information, the website uses dominant package manager for each supported programming language. Then, it organizes them by package manager, programming language, license (MIT or GPL), and by keyword.

9. Creative Commons Search

This search engine is extremely useful for bloggers and authors who need content that could be reused in a blog post or commercial applications. It allows users to search for images and contents that are released under the creative commons license.

The website provides social features, allowing users to build and share lists, as well as add tags to the objects in the commons and save their searches. It also offers some useful filters such as, find images that can be used for commercial purpose, or images that can be modified and reused, or search within tags, title and creator.

8. IxQuick

IxQuick is the metasearch engine that provides the top 10 results from different search engines. In order to rank the results, it uses a ‘star system’ that awards one star to each result that has been returned from a search engine. Therefore, results returned from the most search engines would be at the top.

IxQuick doesn’t store your private details – no history, no query is collected. However, it uses only one cookie, known as ‘preference’, to remember your search preferences for future searches, which automatically gets deleted if you don’t use visit IxQuick for 90 days. Moreover, with around 5.7 million searches per day, the network is growing very fast, and currently supports 17 languages.

7. Dogpile

Yet another metasearch engine that gets results from multiple search engines (including Google, Bing and Yahoo) and directories and then presents them combined to the user. There is an advanced search option that lets you narrow down searches by exact phrase, date, language, and adult content. Also, you can set your own preference and customize default search settings.

In addition to that, Dogpile recommends related content based on the original search term, keeps track of the 15 most recent searches, and shows recent popular searches from the other users.

6. Internet Archive

It’s a nonprofit digital library that aims to provide universal access to all knowledge. Internet Archive consists of websites, music, images, videos, software applications and games, and around 3 million books that fall under public domain.

As of 2016, Internet archive had 15 petabytes of data, advocating for a free and open Internet. Its web archive, known as Wayback Machine, allows users to search for iterations of a website in the past. It contains more than 308 billion web captures, making it one of the world’s largest digitization projects.

5. Yandex

Yandex is the largest search engine in Russia with nearly 65% Russian market share. According the Comscore, it is the fourth largest search engine in the world with over 150 million searches per day as of 2012.

Yandex features a parallel search that shows results from main web index as well as specialized information resources, including blogs, news, image and video webpages, and eCommerce sites. In addition, the search engine provides supplementary information (like sports results), and contains spell checkers, autocomplete functionality and antivirus that detects malicious content on webpages.

4. WolframAlpha

WolframAlpha is a computational knowledge engine that answers factual questions from externally sourced curated data. It does not provide a list of webpages or documents that might contain the specific answer you are looking for. Instead, you get a one-word or one-line, and to-the-point answer.

It is written in Wolfram programming language (contains over 15 million lines of code) and runs on more than 10,000 CPUs. It is based on a computational platform known as Wolfram Mathematica that encompasses numerical computation, computer algebra, statistics and visualization capabilities.

3. Ask.com

Launched in 1996, Ask.com is a question answering-focused web search engine. Despite its age, Ask is still very active. They have coupled their search-system with a robust questions and answer system with billions of online content.

As of 2014, the website had 180 million global users per month (with a larger user base in the US), and to date, its mobile app has been downloaded over 40 million times. They acquired a social networking site, Ask.fm, where people can ask questions with the option of anonymity. ASKfm handles around 20,000 questions every minute.

Read: 30 Cool Alternative Web Browsers You Didn’t Know of

2. Ecosia

Ecosia donates 80% of its profit to plant trees and supports full financial transparency. As of October 2017, the website has reached the milestone of 15 million trees planted. In 2015, the company was shortlisted for the European Tech Startups Awards under the ‘Best European Startup Aimed at Improving Society’ category.

The search result(s) of Ecosia is powered by Bing and Ecosia’s own search algorithms. The company claims that it takes 45 searches to fund the planting of single tree, and they assure that algorithms can easily detect fake clicks and invalidate them. Currently, it’s the default search engine of Vivaldi, Waterfox and Polarity web browser.

1. DuckDuckGo

DuckDuckGo is the best alternative option available out there. The search engine doesn’t collect any of your personal information or store your history. They don’t follow around you with ads because they have nothing to sell to advertisers.

Read: 15 Mobile App Search Engines | for both Android and iOS

DuckDuckGo doesn’t provide personalized results – all users will see the same results for a given search query. Rather than returning thousands of results, it emphasizes on returning the best results, and extracts those results from more than 400 sources. It’s a smart search engine (uses semantic search technique like Google) that depends on a highly evolved contextual library for intuiting the user’s intent.

report this ad

Source

What is an open-source search engine?


Firstly, I doubt whether you would have used an open source search engine unless you use Linux or are in the research field.

None of the large web search engines are open source. To repeat what John Linn said,

Open source simply means that the source code (programming) is available to anyone to use and modify as they desire.

Search engines like Lucene[1], Nutch[2], Terrier[3], Xapian[4] and others are examples of Open Source search engines. They all allow you to change the code of the retrieval and ranking process.

[1] lucene.apache.org/core/
[2] http://nutch.apache.org/
[3] http://te…Loading…

Source

Choosing an Open Source search engine: Solr or Elasticsearch?

There has never been a better time to be a search and open source enthusiast than 2017. Far behind are the old days of information retrieval being a field only available for academia and experts.

Now we have plenty of search engines that allow us not only to search, but also navigate and discover our information. We are going to be focusing on two of the leading search engines that happed to be open source projects: Elasticsearch and Solr.

Organisations and Support

Solr is an Apache sub-project developed parallelly along Lucene. Thanks to this it has synchronized releases and benefits directly from any new Lucene feature.

Lucidworks (previously Lucid Imagination) is the main company supporting Solr. They provide development resources for Solr, commercial support, consulting services, technical training and commercial software around Solr. Lucidworks is based in San Francisco and offer their services in the USA and all rest of the world through strategic partners. Lucidworks have historically employed around a third of the most active Solr and Lucene committers, contributing most of the Solr code base and organizing the Lucene/Revolution conference every year.

Elasticsearch is an open-source product driven by the company Elastic (formerly known as Elasticsearch). This approach creates a good balance between the open-source community contributing to the product and the company making long term plans for future functionality as well as ensuring transparency and quality.

Elastic comprehends not only Elasticsearch but a set of open-source products called the Elastic stack: Elassticsearch, Kibana, Logstash, and Beats. The company offers support over the whole Elastic stack and a set of commercial products called X-Pack, all included, in different tiers of subscriptions. They offer trainings every second week around the world and organize the ElasticON user conferences.

Ecosystem

Solr is an Apache project and by being so it benefits from a large variety of apache projects that can be used along with it. The first and foremost example is its Lucene core (http://lucene.apache.org/core/) that is released on the same schedule and from which it receives all its main functionalities. The other main project is Zookeper that handles SolrCloud clusters configuration and distribution.

On the information gathering side there is Apache Nutch, a web crawler, and Flume , a distributed log collector.

When it comes to process information, there are no end to Apache projects, the most commonly used alongside Solr are Mahout for machine learning, Tika for document text and metadata extraction and Spark for data processing.

The big advantage lies in the big data management and storage, with the highly popular Hadoop  library as well as Hive, HBase, and Cassandra databases. Solr has support to store the index in a Hadoop Highly Distributed File System for high resilience.

Elasticsearch is owned by the Elastic company that drives and develops all the products on its ecosystem, which makes it very easy to use together.

The main open-source products of the Elastic stack along Elasticsearch are Beats, Logstash and Kibana. Beats is a modular platform to build different lightweight data collectors. Logstash is a data processing pipeline. Kibana is a visualization platform where you can build your own data visualization, but already has many build-in tools to create dashboards over your Elasticsearch data.

Elastic also develop a set of products that are available under subscription: X-Pack. Right now, X-Pack includes five producs: Security, Alerting, Monitoring, Reporting, and Graph. They all deliver a layer of functionality over the Elastic Stack that is described by its name. Most of them are included as a part of Elasticsearch and Kibana.

Strengths

Solr

  • Many interfaces, many clients, many languages.
  • A query is as simple as solr/select?q=query.
  • Easy to preconfigure.
  • Base product will always be complete in functionality, commercial is an addon.

Elasticsearch

  • Everything can be done with a JSON HTTP request.
  • Optimized for time-based information.
  • Tightly coupled ecosystem.

Base product will contain the base and is expandable, commercial are additional features.

solr vs elasticsearch comparison open source search engine

If you are already using one of them and do not explicitly need a feature exclusive of the other, there is no big incentive in making a migration.

In any case, as the common answer when it comes to hardware sizing recommendations for any of them: “It depends.” It depends on the amount of data, the expected growth, the type of data, the available software ecosystem around each, and mostly the features that your requirements and ambitions demand; just to name a few.

 

At Findwise we can help you make a Platform evaluation study to find the perfect match for your organization and your information.

 

Written by: Daniel Gómez Villanueva – Findability and Search Expert

Source

OpenSearchServer search engine

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.

Features

  • A crawler allows you to index the following: web pages; rich format documents from files on local and remote systems; and contents from any JDBC database, such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server,
  • Full text analyzers and filters allowing indexing and searches in 16 languages: Chinese, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.
  • Multi-lingual analysers slice sentences into words, then run lemmatisation algorithms on words based on the document’s language (singular/plural, gender, conjugated verbs, etc.)
  • HTML renderer allowing the integration of the search box in an HTML/XHTML page, working with PHP and .NET, client library and XML over HTTP API,
  • Parsers allowing you to get content and metadata from most documents and formats, such as XML, HTML/XHTML, Adobe™ PDF, Microsoft™ Word™, PowerPoint™, OpenOffice™, RTF, Plain text, Torrent, Audio files (MP3/MP4, OGG, FLAC, WMA) etc.
  • A series of caches to accelerate processes and deliver faster applications,
  • Monitoring and administration: Alerting services, integrated scheduler, index replication, user management,
  • Free online developers’ documentation,
  • Advanced functionality: faceted search, clustering, filters, snippets, synonyms, stopwords, highlighting, categorization, “find similar”, automatic thumbnail screenshot inclusion, boost/reduce relevance,
  • Drupal module and WordPress module available.
  • REST API, PHP, .NET and Ruby client
  • RIA web interface built around the Zkoss (ZK) framework.

Project Samples

Categories

Enterprise, Indexing/Search, Search

Follow OpenSearchServer search engine

OpenSearchServer search engine Web Site

Discover Server and Application Network Dependencies Icon

Server and Application Monitor helps you discover application dependencies to help identify relationships between application servers. Drill into those connections to view the associated network performance such as latency and packet loss, and application process resource utilization metrics such as CPU and memory usage. Determine if process utilization or network performance is affecting the application and end-user performance.

Try it FREE for 30 days!

Rate This Project

Login To Rate This Project

User Ratings

4.9 out of 5 stars

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5

support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 2 / 5

  • denismotte Posted 03/11/2017

    Excellent Lucent implementation. It’s very powerfull and API REST easy to use.

  • regis-92 Posted 07/04/2015

    Powerful but very unstable and greedy in memory…

  • cooperspc Posted 06/06/2014

    First Search Server I can say i got up and running, in minutes, not hours

    1 user found this review helpful.

  • brittanybot Posted 04/27/2014

    Very Good program One thing I can not seem to find coding to Show the thumbnail screenshot on the search results page that i have cataloged. Very little info on the net about this.

    1 user found this review helpful.

  • severineg Posted 08/22/2013

    Fully integrated powerfull solution : the best one for me now. Thanks for your amazing job !

    1 user found this review helpful.

Additional Project Details

Languages

English

Intended Audience

Information Technology, Telecommunications Industry, Advanced End Users, System Administrators, Developers, Engineering

User Interface

Web-based

Programming Language

C++, PHP, Visual Basic .NET, Java

Database Environment

JDBC

Add-ons & Plugins

OpenSearchServer Rails client

OpenSearchServer Rails client

More Info Download OpenSearchServer Ruby client

OpenSearchServer Ruby client

More Info Download OpenSearchServer PHP client

OpenSearchServer PHP client

More Info Download OpenSearchServer Node.js client

OpenSearchServer Node.js client

More Info Download

Report inappropriate content

Source

Customize your internet with an open source search engine

A long time ago, the internet was small enough to be indexed by a few people who gathered the names and locations of all websites and listed them each by topic on a page or in a printed book. As the World Wide Web network grew, the “web rings” convention developed, in which sites with a similar theme or topic or sensibility banded together to form a circular path to each member. A visitor to any site in the ring could click a button to proceed to the next or previous site in the ring to discover new sites relevant to their interest.

Then for a while, it seemed the internet outgrew itself. Everyone was online, there was a lot of redundancy and spam, and there was no way to find anything. Yahoo and AOL and CompuServe and similar services had unique approaches, but it wasn’t until Google came along that the modern model took hold. According to Google, the internet was meant to be indexed, sorted, and ranked through a search engine.

Why choose an open source alternative?

Search engines like Google and DuckDuckGo are demonstrably effective. You may have reached this site through a search engine. While there’s a debate to be had about content falling through the cracks because a host chooses not to follow best practices for search engine optimization, the modern solution for managing the wealth of culture and knowledge and frivolity that is the internet is relentless indexing.

But maybe you prefer not to use Google or DuckDuckGo because of privacy concerns or because you’re looking to contribute to an effort to make the internet more independent. If that appeals to you, then consider participating in YaCy, the peer-to-peer internet indexer and search engine.

Install YaCy

To install and try YaCy, first ensure you have Java installed. If you’re on Linux, you can follow the instructions in my How to install Java on Linux article. If you’re on Windows or MacOS, obtain an installer from AdoptOpenJDK.net.

Once you have Java installed, download the installer for your platform.

If you’re on Linux, unarchive the tarball and move it to the /opt directory:

$ sudo tar –extract –file  yacy_*z –directory /opt

Start YaCy according to instructions for the installer you downloaded.

On Linux, start YaCy running in the background:

$ /opt/yacy/startYACY.sh &

In a web browser, navigate to localhost:8090 and search.

YaCy start page

Add YaCy to your URL bar

If you’re using the Firefox web browser, you can make YaCy your default search engine in the Awesome Bar (that’s Mozilla’s name for the URL field) with just a few clicks.

First, make the dedicated search bar visible in the Firefox toolbar, if it’s not already (you don’t have to keep the search bar visible; you only need it active long enough to add a custom search engine). The search bar is available in the hamburger menu in the upper-right corner of Firefox in the Customize menu. Once the search bar is visible in your Firefox toolbar, navigate to localhost:8090, and click the magnifying glass icon in the Firefox search bar you just added. Click the option to add YaCy to your Firefox search engines.

Adding YaCy to Firefox

Once this is done, you can mark it as your default in Firefox preferences, or just use it selectively in searches performed in the Firefox search bar. If you set it as your default search engine, then you may have no need for the dedicated search bar because the default engine is also used by the Awesome Bar, so you can remove it from your toolbar.

YaCy is an open source and distributed search engine. It’s written in Java, so it runs on any platform, and it performs web crawls, indexing, and searching. It’s a peer-to-peer (P2P) network, so every user running YaCy joins in the effort to track the internet as it changes from day to day. Of course, no single user possesses a full index of the entire internet because that would take a data center to house, but the index is distributed and redundant across all YaCy users. It’s a lot like BitTorrent (as it uses distributed hash tables, or DHT, to reference index entries), except the data you’re sharing is a matrix of words and URL associations. By mixing the results returned by the hash tables, no one can tell who has searched for what words, so all searches are functionally anonymous. It’s an effective system for unbiased, ad-free, untracked, and anonymous searches, and you can join in just by using it.

The act of indexing the internet refers to separating a web page into the singular words on it, then associating the page’s URL with each word. Searching for one or more words in a search engine fetches all URLs associated with the query. That’s one thing the YaCy client does while running.

The other thing the client does is provide a search interface for your browser. Instead of navigating to Google when you want to search, you can point your web browser to localhost:8090 to search YaCy. You may even be able to add it to your browser’s search bar (depending on your browser’s extensibility), so you can search from the URL bar.

Firewall settings for YaCy

When you first start using YaCy, it’s probably running in “junior” mode. This means that the sites your client crawls are available only to you because no other YaCy client can reach your index entries. To join the P2P experience, you must open port 8090 in your router’s firewall and possibly your software firewall if you’re running one. This is called “senior” mode.

If you’re on Linux, you can find out more about your computer’s firewall in Make Linux stronger with firewalls. On other platforms, refer to your operating system’s documentation.

A firewall is almost always active on the router provided by your internet service provider (ISP), and there are far too many varieties of them to document accurately here. Most routers provide the option to “poke a hole” in your firewall because many popular networked games require two-way traffic.

If you know how to log into your router (it’s often either 192.168.0.1 or 10.1.0.1, but can vary depending on the manufacturer’s settings), then log in and look for a configuration panel controlling the firewall or port forwarding or applications.

Once you find the preferences for your router’s firewall, add port 8090 to the whitelist. For example:

Adding YaCy to an ISP router

If your router is doing port forwarding, then you must forward the incoming traffic to your computer’s IP address, using the same port. For example:

Adding YaCy to an ISP router

If you can’t adjust your firewall settings for any reason, that’s OK. YaCy will continue to run and operate as a client of the P2P search network in junior mode.

An internet of your own

There’s much more you can do with the YaCy search engine than just search passively. You can force crawls of underrepresented websites, you can request the network crawl a site, you can choose to use YaCy for just on-premises searches, and much more. You have better control over what your internet looks like. The more senior users there are, the more sites indexed. The more sites indexed, the better the experience for all users. Join in!

Source

6 Top & Best Open source Search Engine Software for Enterprises

– Advertisement –

Would you like to have a search engine like Google, for your enterprise? Then the Open source might have a solution to for you. There are a couple of well-known search engine software; you can call them the best enterprises open source search engine software because they allow you to search for information within your enterprise domain. They can search for data from multiple databases and intranets those are build to work and save your enterprise important data and other pieces of information.

These enterprise search engine servers software can be installed on a laptop to test and then on your servers.  The functionality of these open source engine is like Google and Yahoo but particularly for a startup business or enterprises. As I told you above these search engine can index from multiple databases and intranets but they are not limited to their only; files indexing of documents from different file systems, document management systems and emails is also possible.

The Open source Big data search software can also collect the structure and unstructured data. The admin can also use security policies to restrict users from accessing any particular collection of information. Now without wasting much time let’s the top available best open source search engine software.

Note: I am not an expert of search engine software and whatever the information is given here,  based on the Wikipedia and other Internet research. If you think, I missed any other great search engine software fall under the Open source category, please help me to complete this list…

Apache Lucene Core

The Apache Lucene Core is the most reliable cross-platform open source search engine project that distributed under the Apache License and completely based on Java. However, despite purely written in Java, it also ported and available in other programming languages such as Delphi, Perl, C#, C++, Python, Ruby, and PHP. It works ranking search system that means the best results returned first. Lucene uses pluggable ranking models, including the Vector Space Model and Okapi BM25. It also supports many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more.

Elasticsearch open source search engine

Elasticsearch is an open source search engine software which is a distributed, RESTful search and analytics engine that based on Apache Lucene. It is a highly scalable open source search engine which means can support the small-medium business to large enterprises.  The Elastic search engine provides full-text search capabilities with HTTP web interface and Schema-free JSON documents. It is a distributed search system that means each index is fully sharded with a configurable number of shards. Also, each shard can have one or more replicas and read/search operations can be performed on any of the replica shards.

It is developed in Java and officially its clients available in many languages such as Curl, Java, .NET(C#), Python, JavaScript, PHP, Perl, Ruby,  Apache Groovy and more. See:Install & uninstall Elasticsearch on Ubuntu 19.04, 18.04 & 16.04

Apache Solr search engine platform open source

After the ElasticSearch, the Apache Solr is another popular open source search engine software and also according to the DB Ranking. It is also developed in Java and support full-text search and real-time indexing. Moreover, like Elasticsearch, the Apache Solr is also based on the Lucene and uses its Java search library.  It is a standalone enterprise search server with a REST-like API. You can do indexing in the Solr via JSON, XML, CSV or binary over HTTP. And to receive the results your query it using HTTP GET.

Solr has a plugin architecture that allows increasing the capabilities of the search engine for both index and query. Moreover, being an open source you can also customize its codes to work the plugins according to your requirements.

Sphinx Search engine

People those already have used the Elasticsearch and looking some other option they can try the Sphinx. It is also a free and open-source information retrieval software library that supports the full text. It can be implemented as a standalone server which is written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.

It can index and search data stored in the SQL database and NoSQL storage.  It powers some highly documented websites where millions of search query generated per days such as Craigslist, Living Social, MetaCafe, and Groupon…

If you talk about this Open source search engine indexing speed then it can index up to 10-15 MB of text per second per single CPU core, that is 60+ MB/sec per server (on a dedicated indexing machine). Its few key features are: Batch and Real-Time full-text indexes, Non-text attributes support, SQL database indexing, Easy application integration, Advanced full-text searching syntax, Rich database-like querying features, Better relevance ranking, Flexible text processing, and Distributed searching.

DataparkSearch Engine

DataparkSearch Engine is open source web-based search engine that allow searching within a website, group of websites, intranet or local system. It features http, https, ftp, nntp and news URL schemes support, can indexes text/html, text/xml, text/plain, audio/mpeg (mp3) and image/gif mime types natively, Handles Internationalized Domain Names (IDN), allow noindex tags like  , , , Google’s special comments , and consider as tags to include/exclude; can specify a content body tag, Spellchecking and more.

Xapian

Xapian is another Open Source Search Engine Library written in C++, with bindings to allow use from Perl, Python 2, Python 3, PHP 5, PHP 7, Java, Tcl, C#, Ruby, Lua, Erlang, Node.js, and R.

 

You may want to see:

Source