Quelle est la différence de performances dans les implémentations de la division relationnelle MySQL (IN AND au lieu de IN OR) ?

J'ai fait quelques améliorations dans le JOIN version; voir ci-dessous.

Je vote pour l'approche JOIN pour la rapidité. Voici comment je l'ai déterminé :

AVOIR, version 1

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_external_lock      | 2     |
| Handler_read_first         | 1     |
| Handler_read_key           | 2     |
| Handler_read_last          | 1     |
| Handler_read_next          | 4175  | -- full index scan

(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                            |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | city_state | 769     | NULL | 4176 | Using where; Using index for group-by (scanning) |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+

Le 'Extra' précise qu'il a décidé de s'attaquer au GROUP BY et utilisez INDEX(city, state) même si INDEX(state, city) pourrait avoir du sens.

AVOIR, version 2

Le faire passer à INDEX(state, city) donne :

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200  IGNORE INDEX(city_state)
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 2     |
| Handler_read_key           | 401   |
| Handler_read_next          | 398   |
| Handler_read_rnd           | 398   |
(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                    |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | state_city | 2       | NULL |  397 | Using where; Using index; Using filesort |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+

JOIGNEZ-VOUS

mysql> SELECT x.city
    -> FROM us_vch200 x
    -> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO'
    -> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA'
    -> WHERE                                  x.state = 'IL';
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
2 rows in set (0.00 sec)

mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 6     |
| Handler_read_key           | 86    |
| Handler_read_next          | 87    |
(etc)    
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| id | select_type | table | type | possible_keys         | key        | key_len | ref                | rows | Extra                    |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
|  1 | SIMPLE      | y     | ref  | state_city,city_state | state_city | 2       | const              |   81 | Using where; Using index |
|  1 | SIMPLE      | z     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
|  1 | SIMPLE      | x     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+

Uniquement INDEX(state, city) est nécessaire. Les nombres de Handler sont les plus petits pour cette formulation, donc j'en déduis que c'est le plus rapide.

Remarquez comment l'optimiseur a décidé par lui-même quelle table commencer, probablement à cause de

+-------+----------+
| state | COUNT(*) |
+-------+----------+
| IL    |      221 |
| MO    |       81 |  -- smallest
| PA    |       96 |
+-------+----------+

Conclusion

JOIN (sans l'inutile t tableau) est probablement le plus rapide. De plus, cet index composite est nécessaire :INDEX(state, city) .

Pour revenir à votre cas d'utilisation :

city --> documentid
state --> termid

Mise en garde :YMMV, car la distribution des valeurs pour documentid et termid peut être très différente de celle du cas de test que j'ai utilisé.