[TransWarp] Towards a query theory, part 1: filters and correlation
ianb at colorstudy.com
Wed Oct 15 01:41:00 EDT 2003
On Tuesday, October 14, 2003, at 02:26 PM, Phillip J. Eby wrote:
> At 11:43 AM 10/14/03 -0500, Ian Bicking wrote:
>> On Tuesday, October 14, 2003, at 11:12 AM, Phillip J. Eby wrote:
>>> Generating SQL from "easy" queries involves a lot of details, but is
>>> straightforward, on the whole. Mostly it's a process of walking the
>>> query to extract table aliases, joins, and criteria.
>>> Here are PEAK's "hard" query requirements:
>>> * Support correlations (e.g. the "find employees that lives in the
>>> same city as their supervisor, but was born in a different country"
>> If SQLObject were extended to do this, it might look like:
>> var = Employee.var
>> query = AND(var.address.city == var.supervisor.address.city,
>> var.birthplace.city != var.supervisor.birthplace.city)
> Here's an interesting question for you. How would you know in this
> case that the two 'var.supervisor' references are to the same object?
> Yes, I know it's a one-to-one relationship here, but if it were a
> one-to-many, the semantics get interesting. :)
I this instance I'm resuming that there is only one supervisor. If I
wasn't, and I wanted there to be at least one supervisor for which this
query was true, I'd do:
var = Employee.var()
sup = Employee.var()
query = AND(var.address.city == sup.address.city,
var.birthplaceAddress.city != sup.birthplaceAddress.city,
var.supervisor == sup)
Which I suppose would result in the SQL (assuming particular naming
select distinct var.id
from employee as var, employee as sup, address as a1, address as a2,
address as a3, address as a4
-- get the intermediate/anonymous objects:
var.address_id = a1.id
and sup.address_id = a2.id
and var.birthplace_address_id = a3.id
and sup.birthplace_address_id = a4.id
-- do the actual query:
a1.city = a2.city
and a3.city = a4.city
and var.supervisor_id = sup.id
Hmm... Doing the conversion manually, I can imagine generating the SQL
automatically as well -- not a simple algorithm, but not impossible.
You have to find all the anonymous objects -- in this case var.address,
sup.address, var.birthplaceAddress, and sup.birthplaceAddress. All the
other connections are simple enough to make.
Given this, maybe it's not that hard to support cross-database (or
non-database) queries and joins to those queries. You identify all the
anonymous objects in the query. Whenever one anonymous (or not
anonymous, I guess) object has to be compared with another object not
in the same database, you fetch both objects and execute the
comparisons in Python. But that might be horribly inefficient, I'm not
sure -- if you lose the ability to index because of this, and replace
it with linear searches, you're in a bad place. But I haven't thought
about it much yet.
>> SQLObject's basic metaphor is one of tightly encapsulated
>> row/objects, so we have to produce objects, not arbitrary
>> tuples/relations. I.e., we have to find instances of Employee, not
>> just cities, or a combination of employees and their supervisors.
> Understood. For purposes of conceptual simplification, I'm
> considering selecting an object to mean selecting the *primary key* of
> the object, not any of its attributes. This allows me to translate to
> "standard" relational theory.
If you don't consider updates (which in the span of one query is not
really necessary) you can think of the entire tuple as a unit with the
same effect. But getting the whole row is basically an optimization,
not a fundamentally different thing.
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
More information about the PEAK