stream_logic

Created: 2012-03-26 17:57
Updated: 2013-10-23 08:31

README.mdown

README

StreamLogic can apply set operations to sorted finite streams represented by Enumerables.

Examples

Say you got three files, a, b and c that contains words, one on each line, sorted alphabetically. To get a stream with the words that appear in all three files, do this:

include StreamLogic

s1 = Stream.new { File.open('a') }
s2 = Stream.new { File.open('b') }
s3 = Stream.new { File.open('c') }
(s1 & s2 & s3).each do |line|
  puts line
end

It doesn't matter if the files have ten lines each, or ten million, StreamLogic will not gobble them into memory, but perform the & (logical and, set union) operation streamingly.

An example of the lazyness of StreamLogic is this example that extracts the first four numbers that are divisible by both 3 and 5 from the endless streams of all numbers divisible by 3 and 5, respectively:

def endless_stream_of_numbers_divisible_by(n)
  Enumerator.new do |y|
    counter = 0
    loop do
      y << counter
      counter += n
    end
  end
end

s1 = Stream.new { endless_stream_of_numbers_divisible_by(3) }
s2 = Stream.new { endless_stream_of_numbers_divisible_by(5) }

(s1 & s2).take(4) # => [0, 15, 30, 45]

StreamLogic also supports the | (logical or, set union) operator:

people_a_to_m = Stream.new { File.open('big_list_of_people_a_through_m') }
people_n_to_z = Stream.new { File.open('big_list_of_people_n_through_z') }
some_other_people = Stream.new { File.open('another_big_list_of_people') }
((people_a_to_m | people_n_to_z) & some_other_people).each do |name|
  puts name
end

The code above will print all names in the file "another_big_list_of_people" that appear in either "big_list_of_people_a_through_m" or "big_list_of_people_n_through_z".

Sometimes logic operations sound like the opposite of natural language, so people_a_to_m | people_n_to_z is read as "people_a_to_m OR people_n_to_z", but means the items in "people_a_to_m" and the items in "people_n_to_z". Another way to say the same thing is as I did above, "[the items] that appear in either 'big_list_of_people_a_through_m' or 'big_list_of_people_n_through_z'", so the operation | can be described using either "or" or "and". Language is hard.

If you want to get a merged stream but not filter out duplicates you can use the + operator, and if you want to find the elements that appear only in one of the streams, use -:

s1 = StreamLogic::Stream.new(%w[phil sam steve sue])
s2 = StreamLogic::Stream.new(%w[anne phil sue vincent])
(s1 - s2).to_a # => ['anne', 'sam', 'steve', 'vincent']

Notes

  • Create a Stream by passing an Enumerable as parameter to new, or give a block that returns an Enumerable. The latter is useful when you need to restore some state before reusing the Enumerable (IOs like Files for example).
  • The Enumerables must respond to #to_enum.
  • It's very important that the Enumerables are sorted.
  • The resulting streams will contain only unique items, with the exception of the + operator.

Performance

StreamLogic relies heavily on external iteration. In Ruby external iteration is usually performed with Enumerable#next, but it's not very performant (see discussion here: http://www.ruby-forum.com/topic/196086). Where possible StreamLogic will perform external iteration using mechanisms like IO#gets, Array#[]. If you need to pass in some Enumerable that does not provide an efficient external iteration mechanism, you can implement your own, just include StreamLogic::ExternalEnumeration and implement #next_element to return the next element on each invocation, and :stop_iteration when the end has been reached.

Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more