我有一个很多行的文本,我的问题是如何删除emacs中的重复行?在emacs或elisp软件包中使用该命令,而无需外部utils。如何删除emacs中的重复行
例如:
this is line a
this is line b
this is line a
删除
this is line a
this is line b
我有一个很多行的文本,我的问题是如何删除emacs中的重复行?在emacs或elisp软件包中使用该命令,而无需外部utils。如何删除emacs中的重复行
例如:
this is line a
this is line b
this is line a
删除
this is line a
this is line b
将这个代码到你的.emacs 3号线(同第1线):
(defun uniq-lines (beg end)
"Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
(interactive "r")
(save-excursion
(save-restriction
(narrow-to-region beg end)
(goto-char (point-min))
(while (not (eobp))
(kill-line 1)
(yank)
(let ((next-line (point)))
(while
(re-search-forward
(format "^%s" (regexp-quote (car kill-ring))) nil t)
(replace-match "" nil nil))
(goto-char next-line))))))
用法:
M-x uniq-lines
(defun unique-lines (start end)
"This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are
removed sans the first one, which may be confusing!"
(interactive "r")
(let ((hash (make-hash-table :test #'equal)) (i -1))
(dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
(let ((lines (make-vector (1+ i) nil)))
(maphash
(lambda (key value) (setf (aref lines value) key))
hash)
(kill-region start end)
(insert (mapconcat #'identity lines "\n"))))
(setq s ; because Emacs can't properly
; split lines :/
(substring
s (position-if
(lambda (x)
(not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
(unless (gethash s hash)
(setf (gethash s hash) (incf i))))))
一种替代方案:
\n
(UNIX样式)一致。根据您的情况,这可能是奖金或劣势。split-string
以使其接受字符而不是正则表达式,您可以使它更好一点(更快)。稍长,但是,也许,更多的有效的变体:
(defun split-string-chars (string chars &optional omit-nulls)
(let ((separators (make-hash-table))
(last 0)
current
result)
(dolist (c chars) (setf (gethash c separators) t))
(dotimes (i (length string)
(progn
(when (< last i)
(push (substring string last i) result))
(reverse result)))
(setq current (aref string i))
(when (gethash current separators)
(when (or (and (not omit-nulls) (= (1+ last) i))
(/= last i))
(push (substring string last i) result))
(setq last (1+ i))))))
(defun unique-lines (start end)
"This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are
removed sans the first one, which may be confusing!"
(interactive "r")
(let ((hash (make-hash-table :test #'equal)) (i -1))
(dolist (s (split-string-chars
(buffer-substring-no-properties start end) '(?\n) t)
(let ((lines (make-vector (1+ i) nil)))
(maphash
(lambda (key value) (setf (aref lines value) key))
hash)
(kill-region start end)
(insert (mapconcat #'identity lines "\n"))))
(unless (gethash s hash)
(setf (gethash s hash) (incf i))))))
Emacs缓冲区中的行始终由\ n分隔(无论在相应文件中使用何种分隔符)。 \ r仅用于旧的'selective-display',而这个''selective-display''已经在很多年前被覆盖和文本属性的'seevisible'属性废弃了。 – Stefan
如果你有Emacs 24.4或更新的版本,最简单的方法是使用新的delete-duplicate-lines
函数。需要注意的是
例如,如果你的输入
test
dup
dup
one
two
one
three
one
test
five
M-x delete-duplicate-lines
将使
test
dup
one
two
three
five
您可以选择使用通用参数(C-u
)作为前缀搜索。结果将是
dup
two
three
one
test
five
信贷去emacsredux.com。
其他迂回选项,不给完全一样的结果,可以通过ESHELL:
sort -u
;不保持原件的相对顺序uniq
;更糟糕的是,它需要将其输入进行排序'sort -u'可能不是一个稳定的排序,但'sort -u -s'是 – Squidly
是的,的确如此。现在修好!从eshell运行它似乎是一个不太干净的解决方案,使用内置功能。 – legends2k
@Squid我想我没有正确验证你的最后评论。尝试将输入数据提供给'sort -u'和'sort -us',你会得到与delete-duplicate-lines不同的结果。更重要的是,我们并不是在谈论稳定排序,这意味着维持相同元素的相对顺序。由于我们删除了重复内容,因此无论如何都丢失了相同的元素。 'delete-duplicate-lines'保持原件的顺序不重复;所以用'sort'不能得到相同的结果。 – legends2k
而不是使用kill-ring,您可以将内容保存在'let'绑定变量中。 –
感谢您的咨询! – ymn
感谢您的帮助! – toolchainX