[TransWarp] Towards a query theory, part 1: filters and correlation

Wed Oct 15 01:41:00 EDT 2003

On Tuesday, October 14, 2003, at 02:26 PM, Phillip J. Eby wrote:
> At 11:43 AM 10/14/03 -0500, Ian Bicking wrote:
>> On Tuesday, October 14, 2003, at 11:12 AM, Phillip J. Eby wrote:
>>> Generating SQL from "easy" queries involves a lot of details, but is 
>>> straightforward, on the whole.  Mostly it's a process of walking the 
>>> query to extract table aliases, joins, and criteria.
>>>
>>> Here are PEAK's "hard" query requirements:
>>>
>>> * Support correlations (e.g. the "find employees that lives in the 
>>> same city as their supervisor, but was born in a different country" 
>>> example)
>>
>> If SQLObject were extended to do this, it might look like:
>>
>> var = Employee.var
>> query = AND(var.address.city == var.supervisor.address.city,
>>             var.birthplace.city != var.supervisor.birthplace.city)
>
> Here's an interesting question for you.  How would you know in this 
> case that the two 'var.supervisor' references are to the same object?  
> Yes, I know it's a one-to-one relationship here, but if it were a 
> one-to-many, the semantics get interesting.  :)

I this instance I'm resuming that there is only one supervisor.  If I 
wasn't, and I wanted there to be at least one supervisor for which this 
query was true, I'd do:

var = Employee.var()
sup = Employee.var()
query = AND(var.address.city == sup.address.city,
             var.birthplaceAddress.city != sup.birthplaceAddress.city,
             var.supervisor == sup)
select(query, var)

Which I suppose would result in the SQL (assuming particular naming 
conventions):

select distinct var.id
from employee as var, employee as sup, address as a1, address as a2,
      address as a3, address as a4
where
-- get the intermediate/anonymous objects:
var.address_id = a1.id
and sup.address_id = a2.id
and var.birthplace_address_id = a3.id
and sup.birthplace_address_id = a4.id
and
-- do the actual query:
a1.city = a2.city
and a3.city = a4.city
and var.supervisor_id = sup.id

Hmm... Doing the conversion manually, I can imagine generating the SQL 
automatically as well -- not a simple algorithm, but not impossible.  
You have to find all the anonymous objects -- in this case var.address, 
sup.address, var.birthplaceAddress, and sup.birthplaceAddress.  All the 
other connections are simple enough to make.

Given this, maybe it's not that hard to support cross-database (or 
non-database) queries and joins to those queries.  You identify all the 
anonymous objects in the query.  Whenever one anonymous (or not 
anonymous, I guess) object has to be compared with another object not 
in the same database, you fetch both objects and execute the 
comparisons in Python.  But that might be horribly inefficient, I'm not 
sure -- if you lose the ability to index because of this, and replace 
it with linear searches, you're in a bad place.  But I haven't thought 
about it much yet.

>> SQLObject's basic metaphor is one of tightly encapsulated 
>> row/objects, so we have to produce objects, not arbitrary 
>> tuples/relations.  I.e., we have to find instances of Employee, not 
>> just cities, or a combination of employees and their supervisors.
>
> Understood.  For purposes of conceptual simplification, I'm 
> considering selecting an object to mean selecting the *primary key* of 
> the object, not any of its attributes.  This allows me to translate to 
> "standard" relational theory.

If you don't consider updates (which in the span of one query is not 
really necessary) you can think of the entire tuple as a unit with the 
same effect.  But  getting the whole row is basically an optimization, 
not a fundamentally different thing.

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org