Monday, March 02, 2009

sqlpython 1.6.0 with Wild SQL

I just released sqlpython 1.6.0.

SELECTing a limited, but large, set of columns from a table is a real pain. What if you could use wildcards in the column list of the SELECT statement itself? Wouldn't that be wild?

OK, then, let's SET WILD ON.

jrrt@orcl> cat party

------- --- --- --- --- --- ---
Frodo 8 14 16 15 14 16
Gimli 17 12 10 11 17 11
Legolas 13 15 14 18 15 17
Sam 11 9 14 11 16 13

4 rows selected.

jrrt@orcl> set wild on
wildsql - was: False
now: True
jrrt@orcl> select *i* from party;

--- ---
14 16
12 10
15 14
9 14

4 rows selected.
You can also call columns out by number...
jrrt@orcl> select #1, #5 from party;

------- ---
Frodo 15
Gimli 11
Legolas 18
Sam 11

4 rows selected.
... or use ! as NOT.
jrrt@orcl> select !str from party;

------- --- --- --- --- ---
Frodo 14 16 15 14 16
Gimli 12 10 11 17 11
Legolas 15 14 18 15 17
Sam 9 14 11 16 13

4 rows selected.
... and you can mix it all together.
jrrt@orcl> select n*, !#3, !c* from party;

------- --- --- ---
Frodo 8 16 15
Gimli 17 10 11
Legolas 13 14 18
Sam 11 14 11

4 rows selected.
A bunch of limitations:
  • Wild SQL is not yet a widely-accepted industry standard. Actually, I just made it up. If ANSI hears about it, they will hunt me down with dogs. That's why you need to SET WILDSQL ON to turn it on.
  • Wild SQL only works on the column list - the part between the SELECT and the FROM. It doesn't work in the WHERE clause, or in subqueries.
  • Wild SQL only works in SELECT statements. What, you were thinking about using it in DML? Are you crazy?
  • Do I really have to say that it's very alpha? Well, it is. Expect a trickle of bugfixes over the next few months.


Matt Wilson said...

Very nice. There's a feature in SAS (statistics package, sort of like SQL) that supported wildcards like that and I have missed it for years.

Does this work with postgres?

Catherine said...

Thanks, Matt! Alas, sqlpython only works with Oracle. I'd like to make it cross-RDBMS someday (and postgresql will be my first target), but it'll probably take 50-100 hours of work that my boss has no earthly reason to pay me for...

Jack Mulhollan said...

Here is a common batch operation that would be greatly simplified by Wild SQL. The specification is "Get rid of duplicate ids on dupe_tb; in case of ties, keep the latest date_added for each id". In regular SQL this is a pain because you have to explicitly list all the columns in dupe_tb just to avoid adding a new column called my_seq. Does Wild SQL work with create table as select (CTAS)?

- Jack

create table dedupe_tb
select *,
from (
select a.*,
row_number () over
partition by id
order by date_added desc nulls last
) my_seq
from dupe_tb a
where my_seq = 1

Catherine said...

Hi, Jack!

Uh... hmm. Darn.

That's a really good example, but I don't think it will work with it. It finds-and-replaces the wildcards with a pretty basic search that only finds the first SELECT in a statement. You might be able to do it in a couple steps, making an intermediary table... I don't think it can handle it as-written, though.

Well, you've given me an incentive to improve Wild SQL's capabilities...

Catherine said...

Hi again, Jack...

Just saw this announcement of a SQL parsing module for Python:

... which might be a good way for wild SQL to find the columns in more complex queries like yours. So stay tuned. Or, better yet, join and help make it happen!