Concept: Nested Hash in Ruby
From time-to-time I end up wanting to make a nested hash in Ruby, maybe to build up a data object for a complex API or a result for a service object.
Let’s set up an example. Say that I’m a distributor that’s selling books, and I want to create a dump of the monthly sales like so:
{
<book_id>: {
<date>: <count>,
},
}
Assuming I had an model that represented the sales:
# represents 1 sale of 1 book
class Sale
attr_accessor :book_id
attr_accessor :date
end
I might do something like this:
def aggregate_by_date(all_sales)
all_sales.each_with_object({}) do |sale, sales_by_date|
sales_by_date[book_id] ||= {}
if sales_by_date[book_id][sale.date]
sales_by_date[book_id][sale.date] += 1
else
sales_by_date[book_id][sale.data] = 1
end
end
end
We can clean this up a little bit by providing a default for the inside hash that will initialize a new value to 0. Then we can just always treat it as a number:
def aggregate_by_date(all_sales)
all_sales.each_with_object({}) do |sale, sales_by_date|
sales_by_date[book_id] ||= Hash.new(0)
sales_by_date[book_id][sale.date] += 1
end
end
Nice. That feels a little bit better. What about the initial hash, can we do something there?
There’s a little trick to making a hash that has a default value of a nested hash:
Hash.new { |hash, key| hash[key] = {} }
The block is called to generate the default value when a new key is accessed. Using this we can simplify our method again:
def aggregate_by_date(all_sales)
all_sales.each_with_object(Hash.new { |h,k| h[k] = Hash.new(0) }) do |sale, sales_by_date|
sales_by_date[book_id][sale.date] += 1
end
end
That’s a little dense for me, maybe I’d extract a method:
def aggregate_by_date(all_sales)
all_sales.each_with_object(nested_hash) do |sale, sales_by_date|
sales_by_date[book_id][sale.date] += 1
end
endprivate def nested_hash
Hash.new { |h,k| h[k] = Hash.new(0) }
end
Cool.
But what happens if our interface is even more complicated. Let’s say we have multiple stores now and want to also show which store sold the book on which date.
{
<book_id>: {
<date>: {
<store_id>: <count>,
},
},
}
This changes our model to something like:
# represents 1 sale of 1 book
class Sale
attr_accessor :book_id
attr_accessor :date
attr_accessor :store_id
end
And it might change our brute force algorithm to look like:
def aggregate_by_date(all_sales)
all_sales.each_with_object({}) do |sale, sales_by_date|
sales_by_date[book_id] ||= {}
sales_by_date[book_id][sale.date] ||= Hash.new(0)
sales_by_date[book_id][sale.date][store_id] += 1
end
end
So. Many. Hashes.
Let’s try something a bit out there, what if we could create a Hash
that had a default of a Hash
that had a default of Hash
that… (and turtles all the way down). With a little bit of recursion we can set this up:
def nested_hash
Hash.new { |h, k| h[k] = nested_hash }
end
The initial instantiation of the hash doesn’t have infinite loop problems because the block is only run when key is accessed.
Let’s see what that does to our algorithm:
def aggregate_by_date(all_sales)
all_sales.each_with_object(nested_hash) do |sale, sales_by_date|
if sales_by_date[book_id][sale.date][store_id].blank?
sales_by_date[book_id][sale.date][store_id] = 0
end sales_by_date[book_id][sale.date][store_id] += 1
end
enddef nested_hash
Hash.new { |h, k| h[k] = nested_hash }
end
We have to bring back our initial case for 0
now, but we don’t have to mess with the nested hashes.
Overall, I think this is an interesting solution, but I’m not sure I’d use it in practice. I think there’s a little bit too much hidden complexity vs the straight-forward solution of initializing every step explicitly.
Shout out to David Bai for helping me work through this thought exercise!