重复记录只取其一
左直拳
有时侯会遇到需要过滤重复记录的情况。
所谓的重复记录,不一定就是完全相同的记录,而更多是指一部分字段内容相同的记录。
比如说,有一个表记录访问日志(LOG),结构如下:
字段 | 类型 | 含义 |
ID | INT |
|
PageId | INT | 访问页面的ID |
IP | 字符串 | 访问者的IP地址 |
Date | 字符串 | 访问者日期,如“ 2007-07-01 ” |
要求统计页面的访问IP数量,相同的IP,日期相同,算一个IP;日期不同,就算不同的IP。
无论是用group by date,ip 还是 group by ip,date,都不正确。
最后是这样写:
SELECT COUNT(1)AS IpNum FROM LOG AS a
WHERE NOT EXISTS(
SELECT 1 FROM LOG WHERE IP=a.IP AND Date=a.Date AND Id<a.Id
)
关键之处在于 Id<a.Id,就是说IP相同、日期相同的记录中,我只拿第一条,就是ID最小的那一条。
有三千人溺水,我只挑其中一个来嫖。
与此类似的是删除重复记录。
如果所谓的重复记录,是指除了流水号不同,其他字段都相同,则好办。如上表,
DELETE FROM log AS a WHERE EXISTS(
SELECT 1 FROM log WHERE IP=a.IP AND Date=a.Date AND Id<a.Id)
如果全部字段都相同,怎么办?
貌似应该用游标:
use [test]
go
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[a_dist]') AND type in (N'U'))
DROP TABLE [dbo].[a_dist]
GO
create table a_dist(id int,name varchar(20))
insert into a_dist values(1,'abc')
insert into a_dist values(1,'abc')
insert into a_dist values(1,'abc')
insert into a_dist values(1,'abc')
insert into a_dist values(2,'abc')
insert into a_dist values(2,'abc')
insert into a_dist values(2,'abc')
insert into a_dist values(2,'abc')
insert into a_dist values(2,'abcd')
insert into a_dist values(2,'abcd')
insert into a_dist values(2,'abcd')
insert into a_dist values(2,'abcd')
SELECT * FROM a_dist;
DECLARE curT CURSOR FOR SELECT id,name,COUNT(*) AS num FROM a_dist GROUP BY id,name HAVING COUNT(*)>1;
DECLARE @Id INT,@name VARCHAR(20),@top INT;
OPEN curT;
FETCH NEXT FROM curT INTO @Id,@name,@top;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @top = @top - 1;
DELETE TOP(@top) FROM [a_dist] WHERE [id]=@Id AND [name]=@name;
FETCH NEXT FROM curT INTO @Id,@name,@top;
END
CLOSE curT;
DEALLOCATE curT;
SELECT * FROM a_dist;