SBCL: Cutting a string into subsequent pieces

Discussion of Common Lisp
Post Reply
mcc
Posts: 18
Joined: Fri Mar 27, 2015 10:47 pm

SBCL: Cutting a string into subsequent pieces

Post by mcc » Mon Apr 06, 2015 7:15 am

Hi,

from a list of Shortwave broadcasters (HFCC "Public Data" - a pure ASCII file) I got this header:

Code: Select all

;----+----+----+------------------------------+---+----+-------+---+---+-------+------+------+-+-----+----------+---+---+---+-----+-+-----+-----+-----+-------
;FREQ STRT STOP CIRAF ZONES                    LOC POWR AZIMUTH SLW ANT DAYS    FDATE  TDATE MOD AFRQ LANGUAGE   ADM BRC FMO REQ# OLD ALT1 ALT2  ALT3  NOTES
;----+----+----+------------------------------+---+----+-------+---+---+-------+------+------+-+-----+----------+---+---+---+-----+-+-----+-----+-----+-------
The "+" are markers for the rest of the table, where a field start/end.
With cl-ppcre I created a list like this one from the input:

Code: Select all

(5 10 10 15 15 46 46 50 50 55 55 63 63 67 67 71 71 79 79 86 86 93 93 95 95 101 101 112 112 116 116 120 120 124 124 130 130 132 132 138 138 144 144 150 150 158)
Which shows start and end of each field (using "all-matches")

Next will be get one pair of offsets off this list ...
Is "nth" the appropiate way or am I still "procedure poisoned" ??? ;)
What is the best "LISPy" way to this without blinding a newbie with all innermost secrects of LISP in one answer at once ? ;) ;) ;) 8)

Best regards,
mcc

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: SBCL: Cutting a string into subsequent pieces

Post by edgar-rft » Mon Apr 06, 2015 3:42 pm

There is not much lispyness in string processing, so again LOOP is only reasonable and non-lispy construct. Here is a function that takes your match-list and returns a list of pairs with all (START . STOP) offsets:

Code: Select all

(defun make-offset-pairs (match-list)
  (loop with start = 0
        for stop in match-list
        collect (cons start stop)
        do (setf start stop)))
Example:

Code: Select all

(make-offset-pairs '(1 2 3)) => ((0 . 1) (1 . 2) (2 . 3))
Is that what you want? It looks a bit redundant...

...and because the "File Format description" link in the HFCC archive is broken, here a link to the original ITU document where the file format and the line offsets are specified:
- edgar

Edit: Sorry, just realizing that your match-list already contains all start and stop indices in duplicated order (but the indices of the first field are missing?), so my code above is definitely not what you want. I will come up with a better example in a few minutes...
Last edited by edgar-rft on Mon Apr 06, 2015 6:30 pm, edited 2 times in total.

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: SBCL: Cutting a string into subsequent pieces

Post by edgar-rft » Mon Apr 06, 2015 5:59 pm

Here is the same thing as above, but this time tested with your match-list and not with nonsense data:

Code: Select all

(defun make-offset-pairs (match-list)
  (loop for start = (pop match-list)
        for stop  = (pop match-list)
        while (and start stop)
        collect (cons start stop)))
Test:

Code: Select all

CL-USER> (defparameter *match-list* '(5 10 10 15 15 46 46 50 50 55 55 63 63 67 67
                       71 71 79 79 86 86 93 93 95 95 101 101 112 112 116 116 120
                       120 124 124 130 130 132 132 138 138 144 144 150 150 158))

CL-USER> (make-offset-pairs *match-list*)
((5 . 10) (10 . 15) (15 . 46) (46 . 50) (50 . 55) (55 . 63) (63 . 67) (67 . 71)
 (71 . 79) (79 . 86) (86 . 93) (93 . 95) (95 . 101) (101 . 112) (112 . 116)
 (116 . 120) (120 . 124) (124 . 130) (130 . 132) (132 . 138) (138 . 144)
 (144 . 150) (150 . 158))
Some more practical code will follow...
Last edited by edgar-rft on Mon Apr 06, 2015 6:30 pm, edited 2 times in total.

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: SBCL: Cutting a string into subsequent pieces

Post by edgar-rft » Mon Apr 06, 2015 6:17 pm

Helper function that returns a substring from START to STOP with leading and trailing whitespace removed, or NIL if no such token is specified in the line, either because the substring between START and STOP contains only whitespace, or the START of the substring is beyond the END of the line:

Code: Select all

(defun make-token (line start stop end)
  (unless (> start end)
    (let ((token (string-trim '(#\Space #\Tab #\Newline #\Return)
                              (subseq line start (min stop end)))))
      (when (> (length token) 0) token))))
Function to return a list of substrings (tokens) from a non-comment text line of the file and your match-list:

Code: Select all

(defun make-token-list (line match-list)
  (loop with end = (length line)
        for start = (pop match-list)
        for stop  = (pop match-list)
        while (and start stop)
        collect (make-token line start stop end)))
Test:

Code: Select all

(make-token-list " 3185 0000 1300 4,9                            WRB  100 45        0 902 1234567 300315 251015 D       Eng        USA WRB FCC  1060                     "
                 '(5 10 10 15 15 46 46 50 50 55 55 63 63 67 67 71 71 79 79 86 86 93 93 95 95 101 101 112 112 116 116 120 120 124 124 130 130 132 132 138 138 144 144 150 150 158))
=> ("0000" "1300" "4,9" "WRB" "100" "45" "0" "902" "1234567" "300315" "251015" "D"  NIL "Eng" "USA" "WRB" "FCC" "1060" NIL NIL NIL NIL NIL)
I think that this is more what you want...

- edgar

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: SBCL: Cutting a string into subsequent pieces

Post by edgar-rft » Mon Apr 06, 2015 9:29 pm

...and another one: Look at Common Lisp's PARSE-INTEGER function if you want to transform the numerical strings into integer numbers.

- edgar

mcc
Posts: 18
Joined: Fri Mar 27, 2015 10:47 pm

Re: SBCL: Cutting a string into subsequent pieces

Post by mcc » Tue Apr 07, 2015 9:22 am

Hi Edgar!

OMG! :)

THANK YOU VERY MUCH FOR ALL YOUR EFFORT TO HELP ME, EDGAR!!!
And thank you for the "meta-help" (the format specification...for example!)

Wile I am start learning LISP I want to "talk with an appropriate """spelling""" "
from the beginning. Coming from C,Perl and others I "fear" (a little too strong
a word..), that I will concatenate things that may run but still marked with
the logo "C code inside".
That's why I was asking for a "lispy" version of what I was trying to achieve.

One question:
You are building the pair of offsets like this

Code: Select all

( <number> . <number>)
I had thought to build pairs of offsets like this

Code: Select all

( <number> <number>)
What is the advantage fo rth e"." in between the offsets?

By the way: The matchlist is created by this code:

Code: Select all

(let (matchlist)
  (defun parse (line)
    (if (eql matchlist nil)
      (setf matchlist (all-matches "\\+\\-*" line)))))
This will be called until something else than NIL is returned, cause
the list start is different from the "+----+--------------+...."-line.

Will see how far I will be able to code with ytou help and hints
included! :)

Best regards,
mcc

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: SBCL: Cutting a string into subsequent pieces

Post by edgar-rft » Tue Apr 07, 2015 3:11 pm

mcc wrote:What is the advantage for the "." in between the offsets?
(1 . 2) needs only half the memory of (1 2)

Code: Select all

(cons 1 2)      => (1 . 2)
(car '(1 . 2))  => 1
(cdr '(1 . 2))  => 2

+-----+-----+
|  1  |  2  |
+-----+-----+
(1 . 2) is one cons cell in memory = two memory locations

Code: Select all

(list 1 2) == (cons 1 (cons 2 nil)) => (1 2) == (1 . (2 . nil))
(car  '(1 2))  => 1
(cdr  '(1 2))  => (2) == (2 . nil)
(cadr '(1 2))  => 2
(cddr '(1 2))  => NIL

+-----+-----+            +-----+-----+
|  1  |  * ----pointer-->|  2  | NIL |
+-----+-----+            +-----+-----+
(1 2) needs two cons cells in memory = four memory locations

The Lisp printer uses the "dotted" notation to indicate that the last element in a list of linked cons cells is not NIL. A list with a last element of NIL is called a "proper" list, while a list whose last element in not NIL is called a "dotted" list.

The important thing for now is to know that nearly all of Common Lisp's list-processing functions need proper lists with a NIL at the end.

As usual the full story is in Practical Common Lisp, Chapter 12: List Processing

Understanding cons cells is the key to Lisp's memory management, but don't try this right now.

- edgar

Post Reply