通過csv文件提供的數(shù)據(jù)庫表內(nèi)容:
links.csv的格式:
movies.csv格式,一個movie可以有多種風(fēng)格(genres),通過|分隔:
ratings.csv:
用戶給movie打得分:
tags.csv:movie的標(biāo)簽
練習(xí)一:
列出四張表的總記錄數(shù):
select 'links' as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.LINKS"union allselect 'movies' as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"union allselect 'ratings' as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"union allselect 'tags' as "table name", count(1) as "row count" from "MOVIELENS"."public.aa.movielens.hdb::data.TAGS";
執(zhí)行結(jié)果:
練習(xí)2:計算總共9125部電影,一共包含多少藝術(shù)類別?
DOBEGIN DECLARE genreArray NVARCHAR(255) ARRAY; DECLARE tmp NVARCHAR(255); DECLARE idx INTEGER; DECLARE sep NVARCHAR(1) := '|'; DECLARE CURSOR cur FOR SELECT DISTINCT "GENRES" FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"; DECLARE genres NVARCHAR (255) := ''; idx := 1; FOR cur_row AS cur() DO SELECT cur_row."GENRES" INTO genres FROM DUMMY; tmp := :genres; WHILE LOCATE(:tmp,:sep) > 0 DO genreArray[:idx] := SUBSTR_BEFORE(:tmp,:sep); tmp := SUBSTR_AFTER(:tmp,:sep); idx := :idx + 1; END WHILE; genreArray[:idx] := :tmp; END FOR; genreList = UNNEST(:genreArray) AS ("GENRE"); SELECT "GENRE" FROM :genreList GROUP BY "GENRE";END;
執(zhí)行結(jié)果,總共包含18種:
練習(xí)3:計算每種藝術(shù)類別總共包含多少部電影:
DOBEGIN DECLARE genreArray NVARCHAR(255) ARRAY; DECLARE tmp NVARCHAR(255); DECLARE idx INTEGER; DECLARE sep NVARCHAR(1) := '|'; DECLARE CURSOR cur FOR SELECT DISTINCT "GENRES" FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"; DECLARE genres NVARCHAR (255) := ''; idx := 1; FOR cur_row AS cur() DO SELECT cur_row."GENRES" INTO genres FROM DUMMY; tmp := :genres; WHILE LOCATE(:tmp,:sep) > 0 DO genreArray[:idx] := SUBSTR_BEFORE(:tmp,:sep); tmp := SUBSTR_AFTER(:tmp,:sep); idx := :idx + 1; END WHILE; genreArray[:idx] := :tmp; END FOR; genreList = UNNEST(:genreArray) AS ("GENRE"); SELECT "GENRE", count(1) FROM :genreList GROUP BY "GENRE";END;
練習(xí)4:列出每部電影包含的風(fēng)格數(shù)目:
SELECT "MOVIEID" , "TITLE" , OCCURRENCES_REGEXPR('[|]' IN GENRES) + 1 "GENRE_COUNT" , "GENRES"FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES"ORDER BY "GENRE_COUNT" ASC;
練習(xí)5:羅列出每部電影的風(fēng)格分布情況
SELECT "GENRE_COUNT" , COUNT(1)FROM ( SELECT OCCURRENCES_REGEXPR('[|]' IN "GENRES") + 1 "GENRE_COUNT" FROM "MOVIELENS"."public.aa.movielens.hdb::data.MOVIES")GROUP BY "GENRE_COUNT" ORDER BY "GENRE_COUNT";
比如至少擁有1個風(fēng)格的電影,有2793部,2個風(fēng)格的電影有3039部,等等。
練習(xí)6:計算movie的rating分布情況
SELECT DISTINCT MIN("RATING_COUNT") OVER( ) AS "MIN", MAX("RATING_COUNT") OVER( ) AS "MAX", AVG("RATING_COUNT") OVER( ) AS "AVG", SUM("RATING_COUNT") OVER( ) AS "SUM", MEDIAN("RATING_COUNT") OVER( ) AS "MEDIAN", STDDEV("RATING_COUNT") OVER( ) AS "STDDEV", COUNT(*) OVER( ) AS "CATEGORY_COUNT"FROM ( SELECT "MOVIEID", COUNT(1) as "RATING_COUNT" FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS" GROUP BY "MOVIEID")GROUP BY "RATING_COUNT";
明細(xì)情況:
SELECT "RATING_COUNT", COUNT(1) as "MOVIE_COUNT"FROM ( SELECT "MOVIEID", COUNT(1) as "RATING_COUNT" FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS" GROUP BY "MOVIEID")GROUP BY "RATING_COUNT" ORDER BY "RATING_COUNT" asc;
比如有397部電影的用戶投票數(shù)為5票
練習(xí)7:統(tǒng)計用戶投票情況
SELECT "RATING_COUNT", COUNT(1) as "USER_COUNT"FROM ( SELECT "USERID", COUNT(1) as "RATING_COUNT" FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS" GROUP BY "USERID")GROUP BY "RATING_COUNT" ORDER BY 1 DESC;
有一位用戶投了2391票,一位用戶投了1868票:
練習(xí)8:統(tǒng)計用戶投票得分情況
SELECT "RATING", COUNT(1) as "RATING_COUNT"FROM "MOVIELENS"."public.aa.movielens.hdb::data.RATINGS"GROUP BY "RATING" ORDER BY 1 DESC;
有15095份用戶投票,打的分?jǐn)?shù)是5分
要獲取更多Jerry的原創(chuàng)文章,請關(guān)注公眾號"汪子熙":