1. Computing & Technology

Discuss in my forum

Programming Challenge 37 - Simple Search Engine

Deadline July 31 2010

By , About.com Guide

In this programming contest you are given 1,000 general knowledge questions. You are also provided with a text file containing ten input phrases. Your application should search for each phrase in the 1,000 questions and return the matching results sorted by relevancy as fast as possible.

Relevancy

This is a %age based on the number of matching non trivial words. The words listed below are considered trivial.

  • A
  • Of
  • The
  • And
  • Is
  • It
  • In

So for example if the phrase was Jack in the Box, you should calculate the main relevancy on just the two words Jack and Box. The score for every non trivial word is 90%/number of non trivial words. So two non trivial words score 45% each, three score 30% each etc.

For this phrase, if any of the 1,000 questions has both trivial words then it is scores 90%. If it has either word then it's 45%. If the question has any of the trivial words in the phrase then add 1% per trivial word and if the question has the phrase text in the correct order (i.e. same as search phrase, ignore spaces, capitalization and punctuation) then add 5%.

So If any of the questions had Jack Box. it would score 90%. The box in Jack would score 92% but Jack in the box would score 97%. If scores exceed 100% then that's fine, the highest score should be listed first. If non non-trivial words were found, the score is 0, do not count the trivial words.

Input Files

There are two files, both will be located in the same folders as your exe.

The Output

Output a file output.txt in the same location as your exe. It should contain the search phrase on one line followed by "There were n Results" where n is 1-10 or "There were no results", followed by one line for each the results (if there were any) with the relevancy score at the start.

eg

There was 1 result
56% What colour are most bananas?

Timing Code

Please time from the start of the run, before reading in both files until just before the output file is opened for writing then output the results and then the time taken as the very first line of the output file. If your code is very fast then repeat it in a loop enough times so the total elapsed time is between 1 and 5 seconds. If your program did 500 iterations in 2 seconds then the average time would be 2/500 secs.

Almost Final Results

Congrats to Shawn Knight whose very fast entry just pipping Marembo Ochieng of Kenya (one of two Kenyan entries). I had problems with three entries- two blew up in mid flight! If you can fix them (Mark and Nelson) and resubmit, I'll run them.

  1. Domi Kovacs (Germany, 35) - C# Time(secs) = 1.858
  2. Marembo Ochieng’ (Kenya, 24) - C++ Time(secs) = 0.0830533
  3. Shawn Knight (USA) - C++0.0490196 Time(secs) = 0.0490196

Rules

This is for glory only. About.com does not permit prizes to be given.

Please submit your source code and the output file to the cplus.guide@about.com?subject=Programming Contest 37 email address with the subject line Programming Contest 37.

It must compile with Open Watcom, Microsoft Visual C++ 2005/2008 Express Edition/Microsoft Visual Studio 2003/2005/2008 or Borland Turbo C++ Explorer, Microsoft Visual C# 2005/2008 Express Edition or GCC/G++. If it doesn't compile, it can't be run so is automatically disqualified.

Please include your name, age (optional), blog/website url (optional) and country. Your email address will not be kept, used or displayed except to acknowledge your challenge entry. You can submit as many entries as you like before the deadline which is July 31 2010.

The top ten entries will be listed, judged purely on points. A condition of entry is that you allow your source code to be published on this website, with full credits to you as the author.

©2012 About.com. All rights reserved.

A part of The New York Times Company.