Is this your first visit? You may want to subscribe to the feed.

Articles tagged with sphinx

Cucumber scenarios that depend on Sphinx

I love writing apps that make heavy use of search indexes, but testing them can be a bit of a pain. Here is how I got ThinkingSphinx to play nice with Cucumber.

Here is the relevant part of what I put in features/support/env.rb:

# Cucumber::Rails.use_transactional_fixtures

# http://github.com/bmabey/database_cleaner
require 'database_cleaner'
DatabaseCleaner.strategy = :truncation
Before do
  DatabaseCleaner.clean
end

ts = ThinkingSphinx::Configuration.instance
ts.build
FileUtils.mkdir_p ts.searchd_file_path
ts.controller.index
ts.controller.start
at_exit do
  ts.controller.stop
end
ThinkingSphinx.deltas_enabled = true
ThinkingSphinx.updates_enabled = true
ThinkingSphinx.suppress_delta_output = true

# Re-generate the index before each Scenario
Before do
  ts.controller.index
end

What’s going on here?

Start by commenting out the line about using transactional fixtures in env.rb. Using transactional fixtures will run each scenario inside of a transaction and roll it back at the end of the scenario to revert the database state. Thinking Sphinx uses an after_commit callback for kicking off the delta indexing, but the callback never gets run when transactional fixtures are enabled because the entire scenario is run inside of a big transaction.

Once we’ve disabled transactional fixtures, our test database will start to fill up, likely causing some problems. So we need to add a Before block that clears out the database before each scenario. I’m using database_cleaner, which gives you some different strategies for cleaning the database. Alternatively, the brute-force solution is just to reload the schema before each scenario, but this is slower than truncating the data.

Before do
  ActiveRecord::Base.establish_connection(ActiveRecord::Base.configurations['test'])
  ActiveRecord::Schema.verbose = false
  load "#{RAILS_ROOT}/db/schema.rb"
end

Next, we start Sphinx when env.rb is loaded, and shut it down when the Ruby process exists. We also enable deltas and updates, which are disabled by default in test mode. Finally, we define a Before block that updates the index before each scenario so we don’t end up with a stale index.

Putting it all together

I’m using Sphinx’s delayed delta support, so whenever I update records, I need to have delayed_job process jobs. Instead of trying to get delayed_job to run in the background, I took the easy way out and defined a step: “When the system processes jobs”.

Scenario: Posting a new listing
  Given I am logged in as "MovinMan" 
  When I create a new listing titled "Lots of Boxes" near "49423" 
  And the system processes jobs
  And I browse listings near "49423" 
  Then I can see a listing titled "Lots of Boxes" 

Which is just implemented as:

When 'the system processes jobs' do
  Delayed::Job.work_off
end

If you’re just using the default deltas, and not delayed deltas, then you can update the index like this:

When /^the system updates the index$/ do
  MyModel.sphinx_indexes.first.delta_object.index(MyModel)
end

I hope that helps. Post your suggestions in the comments for improving this.

Code: sphinx Jun 01, 2009 ● updated Jun 25, 2009 13 comments

Keepin' Sphinx Indexes Fresh

<infomercial-voice>Stale indexes got you down? Embarrassed that your users’ searches are coming up empty? Act now and you can avoid stale indexes with NEW and IMPROVED delayed delta indexing!!</infomercial-voice>

Ok, maybe it’s not new and improved. It’s actually been around since January, but it’s still awesome. Thinking Sphinx can use delayed_job to keep indexes fresh.

I was slow at jumping on the Sphinx bandwagon for one reason: the index has to be rebuilt to incorporate new data. Delta indexing alleviated some of this by storing frequent changes in a small separate index, but it still had to be occasionally reindexed. It also seemed to only index existing records in my trials with it. New records didn’t ever seem to show up until I rebuilt the whole index.

From what I can tell, delayed delta indexing makes everything Just Work™, and here’s how to use it…

After you’ve setup ThinkingSphinx, set the :delayed property to :delta in your index:

class Listing < ActiveRecord::Base
  define_index do
    indexes title
    indexes description
    indexes user.login, :as => :user

    set_property :delta => :delayed
  end
end

The delayed delta support depends on delayed_job, but if you’re using the gem version, it’s already bundled in. I’m using delayed job for some other things in my project, so I still have it installed separately and that seems to be working just fine.

delayed_job uses a table to keep track of the jobs that need run, so create a migration containing:

create_table :delayed_jobs, :force => true do |table| 
   table.integer  :priority, :default => 0 
   table.integer  :attempts, :default => 0 
   table.text     :handler 
   table.string   :last_error 
   table.datetime :run_at 
   table.datetime :locked_at 
   table.datetime :failed_at 
   table.string   :locked_by 
   table.timestamps 
end

And lastly, all you need to do is fire up the worker process:

$ rake thinking_sphinx:delayed_delta

Now whenever changes are made to your models, the index will be updated moments later. And that’s how you keep it fresh!

Code: sphinx Apr 29, 2009 ● updated Apr 29, 2009 2 comments

Location-based search with Sphinx and acts_as_geocodable

Sphinx, everybody’s favorite search engine, has support for location-based search, giving you geo-aware, full-text searching. So now you can find all of the garage sales on Saturday within 20 miles that have LPs and a reel mower.

All you need to do is add latitude and longitude (in radians) to the index, allowing you to limit the results to records within a distance of a point. The hardest part is getting the coordinates, but acts_as_geocodable makes that really easy.

To start, install acts_as_geocodable. Once you have that configured properly, install ThinkingSphinx, define an index on your geocodable model and add the coordinates to the index:

class Listing < ActiveRecord::Base
  acts_as_geocodable

  define_index do
    indexes title
    indexes description

    has geocoding.geocode(:id), :as => :geocode_id
    has 'RADIANS(geocodes.latitude)', :as => :latitude, :type => :float
    has 'RADIANS(geocodes.longitude)', :as => :longitude, :type => :float
  end
end

The three lines that start with has add the geocode id, and the latitude and longitude in radians to the index. Our index doesn’t need the geocode id, but we have to add it so that ThinkingSphinx properly joins the geocodes table.

After you rebuild the index and start the daemon, you can search for records by location. Here’s an example of taking a zip code from a user and finding all records with in 20 miles. (Note: you will need to grab the latest version, 0.2.9, of Graticule for this next bit of code to work)

def search
  location = Geocode.geocoder.locate(params[:zip]).coordinates.map(&:to_radians)
  @listings = Listing.search(params[:q], :geo => location,
    :with => {'@geodist' => 0.0..(20 * 1610.0)})
end

After looking up the coordinates of the zip code that the user entered, we do a search with the :geo parameter, limiting the results using the special @geodist condition. We have to pass in a range of floats that represent the distance of the points in meters (and since the US is in the stone age, I’m converting from miles).

That’s all there is to it. Now go write some cool location-based search and comment here about it.

Code: sphinx Apr 14, 2009 ● updated Apr 14, 2009 3 comments

Subscribe

Browse by Tag