remove contiguous lines matching pattern at the end of a file

Discussion:

(too old to reply)

bilroth

2012-12-03 21:43:41 UTC

I would like to remove all contiguous matching the regex:

"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
na$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
na na$"

occurring at the end of a file (if any). Each file will only contain
lines with one of these patterns, but I would like something generic that
will work for any of the files I have. I want to leave all lines
matching this pattern at the start or in the middle of the file (i.e.
surround by other lines) untouched. Obviously if a file contains only
lines matching the regex then the command should produce an empty file.

Ideally I would like to do this with sed, and while it is very straight
forward to remove all occurrences in the file with sed, but I cant
workout out how to restrict to just the lines at the end.

Can this be done with sed. The next obvious way I can see of doing this
is reversing the file with tac, using a read loop to count the number of
lines matching the pattern. Subtract this from the total number of lines
in the file (calculated using wc). Passing the file through tac again
and then using head to extract the lines I want. This seems rather
convoluted and I have struggle to write the loop to count the correct
number of lines.

Any help would be appreciated.

Oleksandr Gavenko

2012-12-04 20:23:30 UTC

Permalink

Post by bilroth
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
na$" or
"[0-9][0-9].[0-9][0-9].[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] na na
na na$"
occurring at the end of a file (if any). Each file will only contain
lines with one of these patterns, but I would like something generic that
will work for any of the files I have. I want to leave all lines
matching this pattern at the start or in the middle of the file (i.e.
surround by other lines) untouched. Obviously if a file contains only
lines matching the regex then the command should produce an empty file.
Ideally I would like to do this with sed, and while it is very straight
forward to remove all occurrences in the file with sed, but I cant
workout out how to restrict to just the lines at the end.
Can this be done with sed. The next obvious way I can see of doing this
is reversing the file with tac, using a read loop to count the number of
lines matching the pattern. Subtract this from the total number of lines
in the file (calculated using wc). Passing the file through tac again
and then using head to extract the lines I want. This seems rather
convoluted and I have struggle to write the loop to count the correct
number of lines.

Just use sed!

Or perl or bash.

But most fast solution is sed for large files and perl script for large set of
small files (if you write Perl script which look over files to prevent
spawning processes).

--
Best regards!

bilroth

2012-12-04 23:17:53 UTC

Permalink

Post by Oleksandr Gavenko
Just use sed!
Or perl or bash.
But most fast solution is sed for large files and perl script for large
set of small files (if you write Perl script which look over files to
prevent spawning processes).

Ok thanks for that. I have no knowledge of perl so I don't really want
to use that.

What I can't work out in sed is how to restrict the
deletions to just those instances occurring at the end of file. A global
deletion is easy enough. Can anyone explain how to do this?

Burton Samograd

2012-12-05 04:18:35 UTC

Permalink

Post by bilroth

Ok thanks for that. I have no knowledge of perl so I don't really want
to use that.
What I can't work out in sed is how to restrict the
deletions to just those instances occurring at the end of file. A global
deletion is easy enough. Can anyone explain how to do this?

If there is a specific start to the section you want to delete, you can
use an address like

sed -e '/MARKER/,$d'

This will match all lines up to the word MARKER and delete those after
it. For example,

1
2
3
MARKER
4
5

when run through the above command will leave:

1
2
3

having deleted all lines up to the end of the file starting at the
MARKER line.

MARKER can be any regular expression, and you can replace the d command
with whatever command you want to run on those lines, such a s/x/y/g.

If you don't have a solid start for the region that you can match with a
regular expression before the end of the file you can use a numerical
address such as:

sed -e '10,$d'

which will delete all lines from 10 to the end of the file.

--
Burton Samograd

Loki Harfagr

2012-12-05 08:33:58 UTC

Permalink

Post by bilroth

Ok thanks for that. I have no knowledge of perl so I don't really want
to use that.
What I can't work out in sed is how to restrict the
deletions to just those instances occurring at the end of file. A global
deletion is easy enough. Can anyone explain how to do this?

this is one way I'd do it in awk, you may prefer and translate it in sed ;-)
$ tac yourfile | awk 'p!=$0{v[p]=1;p=$0} v[$0]' | tac

it simply peels the file reverse and jump over consecutive novelties
while printing previously met stuff.
Note that it works if your file respects
"Each file will only contain lines with one of these patterns"
in case your file might contain other stuff you'll have to take care,
maybbe just add a printer+jumper:
$ tac yourfile | awk '!/pat1 na( na)*$/{print p=$0;next} p!=$0{v[p]=1;p=$0} v[$0]' | tac

(FU2 cla, indeed)