Sourcegraph: An Open-Source Source Code Search Engine

Last year, the Code Search and Navigation tool ‘Sourcegraph‘ was declared Open Source. As it makes navigating through Source Code much more convenient, the tool itself going Open Source was definitely a big plus for developers!

We’ve looked into its features and also tried to find out how it can be so helpful for developers who are used to navigate through code hosts like GitHub, GitLab and others quite regularly.

Now, with its new 3.10 release, let us take a look at how it helps and what’s new.

Sourcegraph Features

As stated on their GitHub page, Sourcegraph has the following features:

  • Fast global Code search
  • Intelligent Code recognition
  • Code host Enhancement on GitHub, GitLab and more
  • Extension API for easier third-party integration

New Features in Sourcegraph 3.10

You will find the latest Sourcegraph 3.10 added with significant improvements and features.

The key highlights for the latest release as mentioned in their official blog post are:

Improved search autocompletion, native GitLab integration, and search and replace automation campaigns

Let’s take a quick look at what has changed with Sourcegraph 3.10:

  • Sourcegraph now provides native code intelligence to GitLab
  • Improved autocompletion for search query filters
  • The ability to create cross-repository search and replace campaigns
  • LSIF-based precise code intelligence now supports five languages: Go, Typescript, Java, C++, and Python
  • Fully automated release testing process

For more information, you can view the complete changelog.

You can deploy Sourcegraph on your server and configure it to work with your or your organization’s Git repositories. Once that’s done, you get a search engine where you can search all the codes.

But if you are a lone developer, like me, you can still use Sourcegraph on GitHub or GitHub alternatives like GitLab

I am going to quickly show you how to use Sourcegraph for better code navigation on GitHub.

Using Sourcegraph on GitHub

Let’s find out how you can easily try this tool with a Firefox or Chrome extension. Here is the official extensions page.

Sourcegraph Official ExtensionsSourcegraph Official ExtensionsSourcegraph Official Extensions

Security Issue Open on GitHub

During the first half of June, I reported a security issue to Sourcegraph that is now currently open on GitHub. Apparently, the issue was fixed around a week ago but yet to be merged to the master branch. The issue is about the official Firefox extension that asks for access to all websites during installation while there should be a filter for repository sites only such as GitHub, GitLab, BitBucket, and others. Originally, due to a Firefox bug (now fixed), it could not be incorporated during early development.

This is how it looks like with the official Sourcegraph extension installed and when you view a file on the Vim repository on GitHub:

Sourcegraph source code engineSourcegraph source code engineSourcegraph extension on GitHubSourcegraph extension on GitHub

Note how we can see the new Sourcegraph buttons within the GitHub interface, thanks to the installed extension. One thing to note is that one need not even login into GitHub to navigate through hosted Code and their repositories in order to make use of the helpful features of Sourcegraph.

When you click on “View File”, the entire look changes and the file is opened for you in a completely new interface within the browser itself:

Sourcegraph extension on GitHubSourcegraph extension on GitHub

Without Sourcegraph, if you want to look for files with a particular format, say C++ .cpp files in this example, it is very difficult to filter and view them if we try to use GitHub’s own search engine within this repository:

Sourcegraph extension on GitHubSourcegraph extension on GitHub

But once you are using this extension, see how easily you can view all such files in one go within the repository:

Sourcegraph extension on GitHubSourcegraph extension on GitHubSourcegraph extension on GitHubSourcegraph extension on GitHub

Sourcegraph can narrow down through Code Search very intelligently as explained in this video:

Code intelligence in Sourcegraph is powered by Lang Server, which enables identifying the type of Programming Language you are using:

Language support SourcegraphLanguage support Sourcegraph

Learn more about its usefulness in the following video:

Bonus Tip on using Sourcegraph 

Even without installing an extension on your browser, you can directly use Sourcegraph as an IDE on top of any repository on GitHub by just adding “sourcegraph.com/” as a prefix to the repository URL.

For example, the URL for the official Vim repository is:

github.com/vim/vim

To view the same through Sourcegraph, modify the URL as below and you’re good to go:

sourcegraph.com/github.com/vim/vim

I’ve also tested this method with GitLab and it works there too! You can try other repositories as well! There is an unofficial extension called Open on Sourcegraph that uses this method on Firefox and Chrome.

Sourcegraph Developers have a master plan behind declaring it Open Source:

Make basic code intelligence ubiquitous (for every language, and in every editor, code host, etc.)
Make code review continuous and intelligent
Increase the amount and quality of open-source code

Here are the ways they suggest you can contribute to its Development:

So this was a brief look into how Sourcegraph can make the developer’s life a lot more easier and hassle-free. You may also want to take a look at Sourcetrail, a recently open source project that allows you to visualize code base.

Are you a Developer? Would you like to adopt this new Open Source tool in your day-to-day programming tasks? Let us know in the comments section below.

95 Shares

Source

Choosing an Open Source search engine: Solr or Elasticsearch?

There has never been a better time to be a search and open source enthusiast than 2017. Far behind are the old days of information retrieval being a field only available for academia and experts.

Now we have plenty of search engines that allow us not only to search, but also navigate and discover our information. We are going to be focusing on two of the leading search engines that happed to be open source projects: Elasticsearch and Solr.

Organisations and Support

Solr is an Apache sub-project developed parallelly along Lucene. Thanks to this it has synchronized releases and benefits directly from any new Lucene feature.

Lucidworks (previously Lucid Imagination) is the main company supporting Solr. They provide development resources for Solr, commercial support, consulting services, technical training and commercial software around Solr. Lucidworks is based in San Francisco and offer their services in the USA and all rest of the world through strategic partners. Lucidworks have historically employed around a third of the most active Solr and Lucene committers, contributing most of the Solr code base and organizing the Lucene/Revolution conference every year.

Elasticsearch is an open-source product driven by the company Elastic (formerly known as Elasticsearch). This approach creates a good balance between the open-source community contributing to the product and the company making long term plans for future functionality as well as ensuring transparency and quality.

Elastic comprehends not only Elasticsearch but a set of open-source products called the Elastic stack: Elassticsearch, Kibana, Logstash, and Beats. The company offers support over the whole Elastic stack and a set of commercial products called X-Pack, all included, in different tiers of subscriptions. They offer trainings every second week around the world and organize the ElasticON user conferences.

Ecosystem

Solr is an Apache project and by being so it benefits from a large variety of apache projects that can be used along with it. The first and foremost example is its Lucene core (http://lucene.apache.org/core/) that is released on the same schedule and from which it receives all its main functionalities. The other main project is Zookeper that handles SolrCloud clusters configuration and distribution.

On the information gathering side there is Apache Nutch, a web crawler, and Flume , a distributed log collector.

When it comes to process information, there are no end to Apache projects, the most commonly used alongside Solr are Mahout for machine learning, Tika for document text and metadata extraction and Spark for data processing.

The big advantage lies in the big data management and storage, with the highly popular Hadoop  library as well as Hive, HBase, and Cassandra databases. Solr has support to store the index in a Hadoop Highly Distributed File System for high resilience.

Elasticsearch is owned by the Elastic company that drives and develops all the products on its ecosystem, which makes it very easy to use together.

The main open-source products of the Elastic stack along Elasticsearch are Beats, Logstash and Kibana. Beats is a modular platform to build different lightweight data collectors. Logstash is a data processing pipeline. Kibana is a visualization platform where you can build your own data visualization, but already has many build-in tools to create dashboards over your Elasticsearch data.

Elastic also develop a set of products that are available under subscription: X-Pack. Right now, X-Pack includes five producs: Security, Alerting, Monitoring, Reporting, and Graph. They all deliver a layer of functionality over the Elastic Stack that is described by its name. Most of them are included as a part of Elasticsearch and Kibana.

Strengths

Solr

  • Many interfaces, many clients, many languages.
  • A query is as simple as solr/select?q=query.
  • Easy to preconfigure.
  • Base product will always be complete in functionality, commercial is an addon.

Elasticsearch

  • Everything can be done with a JSON HTTP request.
  • Optimized for time-based information.
  • Tightly coupled ecosystem.

Base product will contain the base and is expandable, commercial are additional features.

solr vs elasticsearch comparison open source search engine

If you are already using one of them and do not explicitly need a feature exclusive of the other, there is no big incentive in making a migration.

In any case, as the common answer when it comes to hardware sizing recommendations for any of them: “It depends.” It depends on the amount of data, the expected growth, the type of data, the available software ecosystem around each, and mostly the features that your requirements and ambitions demand; just to name a few.

 

At Findwise we can help you make a Platform evaluation study to find the perfect match for your organization and your information.

 

Written by: Daniel Gómez Villanueva – Findability and Search Expert

Source

Arch Intranet Search Engine – Open source, enterprise strength, fast and easy to setup corporate search engine


Optimize fonts for dyslexia

This loads a font easier to read for people with dyslexia.

High contrast mode

This renders the document in high contrast mode.

Invert colors

This renders the document as white on black

Disable interface animations

This can help those with trouble processing rapid screen movements.

Source

18 Advanced Alternative Search Engines of 2020

Google tends to be a giant gorilla in the room during all SEO discussion. The reason behind this is its dominating market share – according to netmarketshare, Google holds more than 90% of mobile and tablet, and around 80% of desktop global search engine market share.

However, it isn’t the only option. There are literally tons of search engines on the web. Some of them focuses on tech news or research paper, while some provide a single line answer instead of listing millions of pages.

We would like to present you some of the most advanced alternatives to Google that will help you find what Google might not. We are not saying they are better than Google, but some of them are good at performing specific searches. Because our aim is to uncover the things you might not aware of, we haven’t included some big players like Bing, Baidu and Yahoo search.

18. StartPage

startpage

StartPage was the first search engine to allow users to search privately. None of your details are recorded and no cookies are used, unless you allow it to remember your preferences. It also provides a proxy for those who want to not just search, but browse the internet with full privacy.

In 2014, the company released a privacy protecting email service, called StartMail. As of 2015, the search engine reached its record daily direct queries of 5.7 million (28-day average).

17. BoardReader

BoardReader is a very useful resource for any type of community research, as it searches forums and message boards. Users can either look for content on the forums or for forums related to the specific topic.

The front-end look quite simple, exactly what forum search engine should look like, but on the back-end they run a robust data business by selling off user’s data to advertising companies.

16. Yippy

Founded in 2009, Yippy is a metasearch engine that offers a cluster of results. It’s search technology is used in IBM Watson Explorer (a cognitive exploration and content analysis platform).

With Yippy, you can search different types of content, including news, images, blogs, government data, etc., and filter the results category wise or flag any inappropriate content. Like Google, it lets you view cached webpages and filter results by sources or tag clouds. Also, there is a preview link on each result that shows how content looks like, on the same page.

15. FindSounds

FindSounds is the perfect search engine for finding sound effects for personal or commercial use. Just filter the results before you begin, using the suitable checkboxes. You can search anything by category, from animal to vehicle sound effects, and the search engine will return you detailed results, along with file format, length and bit-rate information.

Overall, searching sound effects using google is always an option, but FindSounds is perfect sound engine to speed up your search and get the specific element you are looking for.

14. SearchCode

SearchCode is a free source code and documentation search engine that finds code snippets from open source repositories. It has indexed more than 20 billion lines of code, from projects on Google code, Github, Sourceforge, GitLab, Bitbucket, Codeplex and more.

Most web crawlers face difficulties while searching special characters used in the code. SearchCode overcomes this issue and lets you search for code by method name, variable name, operations, usage, security flaws and by special characters much faster than other code search engines.

13. GigaBlast

GigaBlast is an open source search engine, written in C and C++ programming language. As of 2015, they had indexed more than 12 billion webpages and received billions queries per month. It provides search results to other companies like Zuula, Blingo, Clusty and Snap.

GigaBlast allows you to search with certain customizations and optional parameters, for instance, searching by exact phrase, terms, filetypes, languages and much more.

12. KidRex and Kiddle

KidRex and Kiddle are both child-safe search engine that keeps out age-inappropriate content unfit for consumption for children. Although they are powered by Google Custom Search (utilize Google SafeSearch), they maintain their own database of inappropriate keywords and websites.

The interface of KidRex features hand-drawn crayon and colored marker design, whereas, Kiddle is written in the characteristic colorful Google Style, with a red droid alien on the top waiting to answer your queries.

Also, you will find search results are slightly modified. For instance, if you search Narendra Modi, the search engine would return webpages from sites like famousbirthdays.com, britannica.com, instead of Wikipedia and news websites. The aim is to provide the simple and easy-to-read content that kids could understand without putting a lot of effort.

11. MetaGer

MetaGer is German-based metasearch engine, developed on 24 small scale web crawlers. It focuses on user’s privacy and makes searches untraceable by leaving no footprint behind. Also, it integrates a proxy server so that users can open any link anonymously from the search results while keeping their IP address hidden from the destination server. This eliminates the chances of advertisers to target you for ads.

The results are obtained from 50 different search engines. Before presenting final results of the query, they are filtered, compiled an sorted.

10. Libraries.io

This is an open source search engine for finding software development project, including new frameworks, libraries and tools. It monitors more than 2.5 million open source libraries across 34 different package managers.

In order to collect the library information, the website uses dominant package manager for each supported programming language. Then, it organizes them by package manager, programming language, license (MIT or GPL), and by keyword.

9. Creative Commons Search

This search engine is extremely useful for bloggers and authors who need content that could be reused in a blog post or commercial applications. It allows users to search for images and contents that are released under the creative commons license.

The website provides social features, allowing users to build and share lists, as well as add tags to the objects in the commons and save their searches. It also offers some useful filters such as, find images that can be used for commercial purpose, or images that can be modified and reused, or search within tags, title and creator.

8. IxQuick

IxQuick is the metasearch engine that provides the top 10 results from different search engines. In order to rank the results, it uses a ‘star system’ that awards one star to each result that has been returned from a search engine. Therefore, results returned from the most search engines would be at the top.

IxQuick doesn’t store your private details – no history, no query is collected. However, it uses only one cookie, known as ‘preference’, to remember your search preferences for future searches, which automatically gets deleted if you don’t use visit IxQuick for 90 days. Moreover, with around 5.7 million searches per day, the network is growing very fast, and currently supports 17 languages.

7. Dogpile

Yet another metasearch engine that gets results from multiple search engines (including Google, Bing and Yahoo) and directories and then presents them combined to the user. There is an advanced search option that lets you narrow down searches by exact phrase, date, language, and adult content. Also, you can set your own preference and customize default search settings.

In addition to that, Dogpile recommends related content based on the original search term, keeps track of the 15 most recent searches, and shows recent popular searches from the other users.

6. Internet Archive

It’s a nonprofit digital library that aims to provide universal access to all knowledge. Internet Archive consists of websites, music, images, videos, software applications and games, and around 3 million books that fall under public domain.

As of 2016, Internet archive had 15 petabytes of data, advocating for a free and open Internet. Its web archive, known as Wayback Machine, allows users to search for iterations of a website in the past. It contains more than 308 billion web captures, making it one of the world’s largest digitization projects.

5. Yandex

Yandex is the largest search engine in Russia with nearly 65% Russian market share. According the Comscore, it is the fourth largest search engine in the world with over 150 million searches per day as of 2012.

Yandex features a parallel search that shows results from main web index as well as specialized information resources, including blogs, news, image and video webpages, and eCommerce sites. In addition, the search engine provides supplementary information (like sports results), and contains spell checkers, autocomplete functionality and antivirus that detects malicious content on webpages.

4. WolframAlpha

WolframAlpha is a computational knowledge engine that answers factual questions from externally sourced curated data. It does not provide a list of webpages or documents that might contain the specific answer you are looking for. Instead, you get a one-word or one-line, and to-the-point answer.

It is written in Wolfram programming language (contains over 15 million lines of code) and runs on more than 10,000 CPUs. It is based on a computational platform known as Wolfram Mathematica that encompasses numerical computation, computer algebra, statistics and visualization capabilities.

3. Ask.com

Launched in 1996, Ask.com is a question answering-focused web search engine. Despite its age, Ask is still very active. They have coupled their search-system with a robust questions and answer system with billions of online content.

As of 2014, the website had 180 million global users per month (with a larger user base in the US), and to date, its mobile app has been downloaded over 40 million times. They acquired a social networking site, Ask.fm, where people can ask questions with the option of anonymity. ASKfm handles around 20,000 questions every minute.

Read: 30 Cool Alternative Web Browsers You Didn’t Know of

2. Ecosia

Ecosia donates 80% of its profit to plant trees and supports full financial transparency. As of October 2017, the website has reached the milestone of 15 million trees planted. In 2015, the company was shortlisted for the European Tech Startups Awards under the ‘Best European Startup Aimed at Improving Society’ category.

The search result(s) of Ecosia is powered by Bing and Ecosia’s own search algorithms. The company claims that it takes 45 searches to fund the planting of single tree, and they assure that algorithms can easily detect fake clicks and invalidate them. Currently, it’s the default search engine of Vivaldi, Waterfox and Polarity web browser.

1. DuckDuckGo

DuckDuckGo is the best alternative option available out there. The search engine doesn’t collect any of your personal information or store your history. They don’t follow around you with ads because they have nothing to sell to advertisers.

Read: 15 Mobile App Search Engines | for both Android and iOS

DuckDuckGo doesn’t provide personalized results – all users will see the same results for a given search query. Rather than returning thousands of results, it emphasizes on returning the best results, and extracts those results from more than 400 sources. It’s a smart search engine (uses semantic search technique like Google) that depends on a highly evolved contextual library for intuiting the user’s intent.

report this ad

Source

11 Best Privacy Oriented Search Engines To Google in 2020

Brief: In this age of the internet, you can never be too careful with your privacy. Use these alternative search engines that do not track you.

Best privacy oriented alternative search enginesBest privacy oriented alternative search engines

Google – unquestionably being the best search engine out there, makes use of powerful and intelligent algorithms (including A.I. implementations) to let the users get the best out of a search engine with a personalized experience.

This sounds good until you start to live in a filter bubble. When you start seeing everything that ‘suits your taste’, you get detached from reality. Too much of anything is not good. Too much of personalization is harmful as well.

This is why one should get out of this filter bubble and see the world as it is. But how do you do that?

You know that Google sure as hell tracks a lot of information about your connection and the system when you perform a search and take an action within the search engine or use other Google services such as Gmail.

So, if Google keeps on tracking you, the simple answer would be to stop using Google for searching the web. But what would you use in place of Google? Microsoft’s Bing is no saint either.

So, to address the netizens concerned about their privacy while using a search engine, I have curated a list of privacy oriented alternative search engines to Google. 

Best 8 Privacy-Oriented Alternative Search Engines To Google

Do note that the alternatives mentioned in this article are not necessarily “better” than Google, but only focuses on protecting users privacy. Here we go!

1. DuckDuckGo

Duckduckgo Dark ModeDuckduckgo Dark ModeDuckDuckGo (Dark Mode)

DuckDuckGo is one of the most successful privacy-oriented search engines that stands as an alternative to Google. The user experience offered by DuckDuckGo is commendable. I must say – “It’s unique in itself”.

DuckDuckGo, unlike Google, utilizes the traditional method of “sponsored links” to display the advertisements. The ads are not focused on you but only the topic you are searching for – so there is nothing that could generate a profile of you in any manner – thereby respecting your privacy.

Of course, DuckDuckGo’s search algorithm may not be the smartest around (because it has no idea who you are!). And, if you want to utilize one of the best privacy oriented alternative search engines to Google, you will have to forget about getting a personalized experience while searching for something.

The search results are simplified with specific meta data. It lets you select a country to get the most relevant result you may be looking for. Also, when you type in a question or search for a fix, it might present you with an instant answer (fetched from the source).

Although, you might miss quite a few functionalities (like filtering images by license) – that is an obvious trade-off to protect your privacy.

DuckDuckGo

2. Qwant

best privacy oriented search enginebest privacy oriented search engine

Qwant is probably one of the most loved privacy oriented search engines after DuckDuckGo. It ensures neutrality, privacy, and digital freedom while you search for something on the Internet.

If you thought privacy-oriented search engines generally tend to offer a very casual user experience, you need to rethink after trying out Qwant. This is a very dynamic search engine with trending topics and news stories organized very well. It may not offer a personalized experience (given that it does not track you) – but it does offer a rich user experience.

Qwant is a very useful search engine alternative to Google. It lists out all the web resources, social feeds, news, and images on the topic you search for.

Qwant

3. Startpage

Startpage ScreenshotStartpage Screenshot

Attention!

System1 has recently acquired Startpage. While being a digital advertising company System1 claims that they are providing privacy focused products. Hence, we cannot vouch for their claim.

In other words, it’s up to you to trust System1 and Startpage.

Startpage is a good initiative as a privacy-oriented search engine alternative to Google. However, it may not be the best one around. The UI is very similar to that of Google’s (while displaying the search results – irrespective of the functionalities offered). It may not be a complete rip-off but it is not very impressive – everyone has got their own taste.

To protect your privacy, it lets you choose it. You can either select to visit the web pages using the proxy or without it. It’s all your choice. You also get to change the theme of the search engine. Well, I did enjoy my switch to the “Night” theme. There’s an interesting option with the help of which you can generate a custom URL keeping your settings intact as well.

Startpage

4. Privatelee (Discontinued)

best privacy oriented search enginebest privacy oriented search engine

Privatelee was another kind of search engine specifically tailored to protect your online privacy. It did not track your search results or behavior in any way. However, you used to get a lot of irrelevant results after the first ten matched results.

The search engine wasn’t perfect to find a hidden treasure on the Internet but more for general queries. Privatelee also supported power commands – more like shortcuts – which helps you search for the exact thing in an efficient manner. It would save a lot of your time for pretty simple tasks such as searching for a movie on Netflix. If you were looking for a super fast privacy oriented search engine for common queries, Privatelee would have been a good alternative to Google.

Privatelee

5. Swisscows

best privacy oriented search enginebest privacy oriented search engine

Well, it isn’t a dairy farm portfolio site but a privacy-oriented search engine as an alternative to Google. You may have known about it as Hulbee– but it has recently redirected its operation to a new domain. Nothing has really changed except for the name and domain of the search engine. It works the same way it was before as Hulbee.com.

Swisscows utilizes Bing to deliver the search results as per your query. When you search for something, you would notice a tag cloud on the left sidebar which is useful if you need to know about the related key terms and facts.

The design language is a lot simpler but one of its kind among the other search engines out there. You get to filter the results according to the date but that’s about it – no more advanced options to tweak your search results. It utilizes a tile search technique (a semantic technology) to fetch the best results for your queries. The search algorithm makes sure that it is a family-friendly search engine with pornography and violence ruled out completely.

Swisscows

6. searX

best privacy oriented search enginebest privacy oriented search engine

searX is an interesting search engine – which is technically defined as a “metasearch engine”. In other words, it utilizes other search engines and accumulates the results to your query in one place. It does not store your search data being an open source metasearch engine at the same time. You can review the source code, contribute, or even customize it as your own metasearch engine hosted on your server.

If you are fond of utilizing Torrent clients to download stuff, this search engine will help you find the magnet links to the exact files when you try searching for a file through searX. When you access the settings (preferences) for searX, you would find a lot of advanced things to tweak from your end.

General tweaks include – adding/removing search engines, rewrite HTTP to HTTPS, remove tracker arguments from URL, and so on. It’s all yours to control. The user experience may not be the best here but if you want to utilize multiple search engines while keeping your privacy in check, searX is a great alternative to Google.

The only problem here is, you may not have a single domain active for the search engine. Hence, there are multiple searx instances, if the button below does not work, you should browse the list of instances to look for others or just host it yourself.

searX

7. Peekier

best privacy oriented search enginebest privacy oriented search engine

Peekier is another fascinating privacy oriented search engine. Unlike the previous one, it is not a metasearch engine but has its own algorithm implemented. It may not be the fastest search engine I’ve ever used but it is an interesting take on how search engines can evolve in the near future. When you type in a search query, it not only fetches a list of results but also displays the preview images of the web pages listed. So, you get a “peek” on what you seek. While the search engine does not store your data, the web portals you visit do track you.

So, in order to avoid that to an extent, Peekier accesses the site and generates a preview image to decide whether to head into the site or not (without you requiring to access it). In that way, you allow fewer websites to know about you – mostly the ones you trust.

Peekier

8. MetaGer

best privacy oriented search enginebest privacy oriented search engine

MetaGer is yet another open source metasearch engine. However, unlike others, it takes privacy more seriously and enforces the use of Tor network for anonymous access to search results from a variety of search engines. Some search engines who claim to protect your privacy may share your information to the government (whatever they record) because the server is bound to US legal procedures. However, with MetaGer, the Germany-based server would protect even the anonymous data recorded while using MetaGer.

They do house a few number of advertisements (without trackers of course)- but you can get rid of those as well by joining in as a member of the non-profit organization – SUMA-EV – which sponsors the MetaGer search engine.

Metager

9. Ecosia

EcosiaEcosia

I used Ecosia for a while as my primary search engine. It’s a one-of-a-kind privacy-focused search engine that actually plants trees if you use it.

They utilize Bing’s search results at the core – however, I didn’t observe any trackers while using it. When you use their search engine, they make money (which includes monetary benefits from sponsored ads). Next, they contribute a significant amount of money to notable organizations and activists helping plant more trees.

At first, this might seem to be controversial. But, they share monthly financial reports and I’ve also observed respectable organizations involved with Ecosia to help plant more trees. In addition to all this, they claim that their servers run on 100% renewable energy.

Ecosia

10. Gibiru

Gibiru Search EngineGibiru Search Engine

Gibiru is a privacy-friendly search engine that aims for uncensored search results. It doesn’t enforce any trackers but it recommends you to utilize ExpressVPN in addition to their search engine service, in order to prevent other websites to track your activity.

The search results may not be the best around – but it puts some interesting uncensored search results. You should give it a try.

Gibiru

11. Mojeek

MojeekMojeek

Mojeek has been around for a long time now. They’re an independent ‘crawler-based’ search engine, based in the UK, with their own algorithm and index of web pages.

If you are looking for a privacy-focused search engine that does not enforce any trackers while having its own index for search results, you should be good to go. I tried searching for some common queries and was satisfied with the search results. I think you can give it a try for yourself.

Mojeek

Wrapping Up

If you are concerned about your privacy, you should also take a look at some of the best privacy-focused Linux distributions. Among the search engine alternatives mentioned here – DuckDuckGo – is my personal favorite. But it really comes down to your preference and whom would you choose to trust while surfing the Internet.

Do you know some more interesting (but good) privacy-oriented alternative search engines to Google?

Let us know your thoughts in the comments below.

515 Shares

Source

Elasticsearch vs. Solr – Choosing Your Open Source Search Engine

Update: In 2018, our search expert revisited this popular “Elasticsearch vs. Solr” topic and offered new observations incorporating cloud, analytics, and cognitive search. Check out our post here.

Why are we here? What is the purpose of my existence?  Should I exercise or rest and save my energy? Wake up early for work or start late and work through the night? Should I eat my french fries with ketchup or mayonnaise? 

These are all age-old questions that may or may not have answers. Some of them are very hard or terribly subjective. But let me put a bit of effort into trying to answer one of them: Should I use Elasticsearch or Solr?

Here is the scenario. Your organization is looking to implement your first search engine, switch to another search engine – calling out to all the Google Search Appliance (GSA) users looking for a replacement! – or try to save money by moving to open source. You, as a proficient and capable developer, have been called to solve a difficult problem. Your problem has many business requirements, but at the core, it is a “big data and search” problem. 

You need to extract a lot of content from multiple data sources and get insights from that data to help your company grow and achieve their objectives for this year.

 

One Shot

There is a lot at stake here. You can’t miss and you have only one shot. You need the right search engine for the job, you are thinking open source, and you have two popular choices: Elasticsearch or Solr, both of which are steadily ranked in the top two spots among open source and commercial search engines, according to DB-Engines

 

Which Open Source Search Engine Would You Pick?

This is not a coin toss or an easy pick. Both search engines are great and there is no one “right” choice. It all depends on your requirements. 

So the first step is to understand what application you have to build. Then, the next step is to see what each search engine has to offer. And by the way, if you’re still at the intersection of open source vs. commercial solutions, get our free e-book for a deep-dive into the 10 key criteria to consider when selecting a search engine.

 

Feature Rundown

A couple of years back, we wrote a high-level overview blog on Elasticsearch vs. Solr, which discussed overall trends and non-technical insights. Now, as both Elasticsearch and have evolved and become dominant players in the open source search engine market, let’s take another fresh look at each and see where it takes us.

 

Age & maturity

In this case, we can say that Solr has a longer history as it was created in 2004 by Yonik Seely at CNET Networks, which then contributed it to Apache in 2006. It finally graduated to a top-level project in 2007. On the other hand, we have Elasticsearch, which was officially created in 2010, although it was really started in 2001 by its founder Shay Bannon under the name of Compass. Since then, the creators of Kibana, Logstash, and Beats have joined Elasticsearch to create the Elastic Stack product family, which has emerged as a powerful player in the search and log analytics space. With that said, Solr has an advantage of being visible in the market at an earlier date.

 

Community & open source

Both have very active communities. If you check Github, you can see that they are very popular open source projects with plenty of releases.

apache lucene solr github

elasticsearch github

 

A very important detail is that while both are released under the Apache license, and both are open source, they work a little differently. Solr truly is open source – anyone can help and contribute. With Elasticsearch, while people can still offer their contributions, only Elastic’s employees (the company behind Elasticsearch and the Elastic Stack) can accept those contributions. 

Is this good or bad? It depends on how you look at it. This means that if there is a feature you need and you contribute it to the community, with adequate quality, it can be accepted into Solr. With Elasticsearch, it’s up to Elastic to decide whether a contribution would be accepted. So there may be more feature options on Solr. On the other hand, contributions to Elasticsearch, which go through more levels of quality checks, may offer higher consistency and quality.   

 

Documentation

Both Elasticsearch and Solr have very well-documented reference guides. Elasticsearch runs on top of Github and Solr uses Atlassian Confluence. You can find them via the links below.

 

Core technology

Let’s get a little bit more technical. Elasticsearch and Solr are two different search engines. But underneath, they both use Lucene, which means both are built on “the shoulders of giants.”

For those of you who wonder why I consider Lucene a “giant,” it is the actual information retrieval software library under the hood of many search engines. It is extremely fast, stable, and probably can’t get better than this. Lucene was created in 1999 by Doug Cutting – one of the creators of Hadoop. So there you go, Lucene is the perfect choice for using at the heart of a search engine.

 

Java APIs and REST

Elasticsearch has a more “Web 2.0” REST API, but Solr does have a much better Java API with SolrJ – or SolrNet if you use Microsoft technologies. Elasticsearch has Nest and Elasticsearch.Net. Solr’s REST API may feel less flexible, but it works wonderfully for what you need: indexing and querying. Elasticsearch speaks JSON, so if you use JSON all around, then it is a good choice. Solr supports JSON as well, but it was added at a later stage as originally it was aimed for XML.

 

Content processing

content processingBecause they both expose an API, it is simple to index content from your custom application or already existing and configurable applications. For example, our Aspire content processing framework is able to connect to multiple data sources and post to either Elasticsearch or Solr. 

Solr also has a feature for extracting text from binary files using Apache Tika. So you can upload a PDF via the ExtractRequestHandler and Solr will know what to do with it. 

On the other hand, Elasticsearch works nicely with Logstash, which can process data from any source and index it.

 

Scalability

Scaling is a key consideration. In this scenario, Elasticsearch was winning the game when Solr was still constrained to Master-Slave. However, SolrCloud has recently come into the game. And with the help of Zookeeper, it is now possible to scale a Solr cluster in a much easier and faster way – an enhancement compared to older versions of Solr with Master-Slave. It will still need a lot of improvements, but the future looks bright in terms of the size of datasets that can be ingested and searched in Solr.

 

Vendor support

There are several companies that got to a point where they had to decide which product worked best for them. For example, Cloudera selected Solr as their search engine to integrate into the open source CDH (Cloudera Distribution Including Hadoop). On the other hand, there are other vendors who have selected Elasticsearch as the search engine for their solutions. We at Search Technologies help with the consulting, deployment, and support of both search engines. 

 

Vision & ecosystem

Solr has been more oriented towards text search. Elasticsearch quickly carved out its niche, aiming for log analytics by creating the Elastic Stack (formerly known as the ELK Stack), which stands for Elasticsearch, Logstash,  Kibana, and Beats. Both have a clear vision and they are making great strides in their directions.

One thing worth reiterating is how both search engines are being used as the foundation of many leading search and big data platforms. For example, Elasticsearch is part of Microsoft’s Azure Search while Solr has been integrated into Cloudera Search.

 

Performance

When it comes to performance, based on the experiences I have heard from many developers, we can say that both engines are solid performers. Thus, for the majority of use cases, whether it is an internal or external search application, performance won’t be much of an issue if the developer designs and configures them properly.

 

Web administration

Solr comes with web administration bundled in, while Elasticsearch has multiple other premium plugins for security, alerting, and monitoring. This list showcases Elastic’s entire product family.

 

Visualization

There are many ways to visualize the data in Elasticsearch and Solr – you can build your custom visualization dashboard or use the search engine’s standard visualization features, perhaps with some tweaks. But there is one difference worth mentioning.  

Solr has focused primarily on text search. It does a great job at this, becoming what seems to be the standard for search applications. But Elasticsearch has moved in a different direction where it goes beyond search to tackle log analytics and visualization with the Elastic Stack. Below are some visualizations you can do with Kibana 5. 

 

kibana 5 dashboard

 

This does not mean one is better than another. It just indicates that each search engine has its own strengths in different use cases and needs, and your selection will greatly depend on what your organization wants to accomplish.

 

So long story short, both Elasticsearch and Solr are excellent open source choices that will help you get more out of your data. It all depends on your requirements, your budget, your timing, and the complexity of your project.

Helpful Resources

  • This e-book details the key criteria for choosing a search engine. It can help guide you through your decision-making process.
  • If you are looking for expert help to evaluate search engine and implementation options, contact us to learn more about our assessment. 

– Xavier

Source

Open Source Intelligence (OSINT) Tools & Resources

keyword research tools

Search Engines

General Search

National Search Engines

Privacy-oriented search engines

  • DuckDuckGo: Online investigators usually use it to search the surface web while using the Tor Browser.
  • Startpage: Fetch results from Google without tracking its users.
  • Peekier: Privacy oriented search engines that fetch its results using its own search algorithm.
  • Qwant: Based in France.
  • Oscobo: Based in UK.
  • Swisscows:  Privacy safe WEB-search based in Switzerland.
  • Gigablast: Open source search engine.
  • Gibiru:  Uncensored and anonymous search engine.

Meta search engines

  • Excite
  • Search
  • MetaGer
  • Zapmeta
  • etools: Compile its results from major international search engines, keep user privacy by not collecting or sharing personal information of its users. This search engine is very fast and show a summary for each search query -on the right side- detailing the source of its results.
  • All the interne: query major search engines including shopping site like Amazon and eBay.
  • izito: Aggregate data from multiple sources (Yahoo, Bing, Wikipedia, YouTube and other) to generate optimal results which includes images, videos, news and articles.
  • Metacrawler: Aggregate results from Google and Yahoo!.
  • My all search: Aggregates results through Bing, DuckDuckGo, AOL Search, Ask, Oscobo, Mojeek, ZapMeta, MetaCrawler.
  • Carrot2: Open Source Search Results Clustering Engine aggregate results from GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, and more.
  • elocalfinder: Fetch results from Google, Yahoo!, Ask and Bing .
  • All-in-One
  • Searx

FTP search engines

Files Search Engines

Image Search Engines

Images shared across social media sites can be found in the following locations:

There are specialized sites that hold images appeared in the press and news media, to search for this type of images go to:

Reverse image search

Video Search Engines

Blog Search

Custom Search Engines

Internet Of Things (IoT) devices search engines

Exploits search engines

Dark Web Search Engines

You should download the TOR Browser first before you can access sites hosted on the TOR network.

News/Newspaper Search Engines

Fake News Detection

  • Snopes: Discovers false news, stories, urban legends and research/validate rumors to see whether it is true.
  • Hoaxy: Check the spread of false claims (like a hoax, rumor, satire, news report) across social media sites. The site derives its results from reputable fact checking organizations to return the most accurate results.
  • FactCheck: This site is partnered with Facebook to help identify and label fake news reported by its users. It also monitors different media for the false info covering a wide range of topics like health, science, hoaxes spread through Spam emails.
  • ReviewMeta: Analysis Amazon user reviews.
  • Reporter Lab: Gives a map of global fact-checking sites.
  • Truth Or Fiction: Discover fake news in different topics like politics, nature, health, space, crime, police and terrorism…etc.
  • Hoax-Slayer: Focuses on email scam and social media hoaxes.
  • Verification Handbook: A definitive guide to verifying digital content for emergency coverage available in different languages.
  • Verification junkie: This is a directory of tools for verifying, fact checking and assessing the validity of eyewitness reports and user self-published content online.
  • citizen evidence: tools and lessons to teach people how to authenticate user-generated online contents. Managed by Amnesty International.

Specialized Search Engines

Niche Search Engines

Patent Search Engines

Web Directories

Translation services

Business Search

Business Annual Records

Business Profiles

Grey literature

Grey information includes the following -and more (Academic papers, preprints, proceedings, conference & discussion papers, research reports, marketing reports, technical specifications and standards, dissertations, theses, trade publications, memoranda, government reports and documents not published commercially, translations, newsletters, market survey, draft version of books, articles.

Most important Grey literature (academic and scholarly resources) websites can be found in the following list:

Data Leak Websites

Pastebin sites

  • Pastebin
  • PasteLert: A Pastebin alerting service dedicated for Pastebin.com website.
  • Dump Monitor: This is a Twitter account that monitors multiple paste sites for password dumps and other sensitive information.

Source

What is an open-source search engine?


Firstly, I doubt whether you would have used an open source search engine unless you use Linux or are in the research field.

None of the large web search engines are open source. To repeat what John Linn said,

Open source simply means that the source code (programming) is available to anyone to use and modify as they desire.

Search engines like Lucene[1], Nutch[2], Terrier[3], Xapian[4] and others are examples of Open Source search engines. They all allow you to change the code of the retrieval and ranking process.

[1] lucene.apache.org/core/
[2] http://nutch.apache.org/
[3] http://te…Loading…

Source