File input stream buffering

Discussion of Common Lisp
Post Reply
kmruiz
Posts: 7
Joined: Mon Aug 13, 2012 5:36 am
Location: Spain
Contact:

File input stream buffering

Post by kmruiz » Wed Aug 15, 2012 10:24 am

Hi all, dear lispers :).

I'm writing a CSV-LISP mapping library in Common LISP. I've released some rudimentary version in github and it works well, but the performance is very poor reading files. I've made some test cases using a Java library (sorry but I can't remember it's name) and for small files the speed and memory usage seems better, but the problem comes when the file is large and I think I loose speed reading by byte.

I want to know if Common Lisp makes some kind of buffering when reading files. Does it? Is it implementation-dependent? I'm using the lastest version of SBCL for testing.

Thank you in advance.

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: File input stream buffering

Post by edgar-rft » Wed Aug 15, 2012 11:59 am

See READ-SEQUENCE and WRITE-SEQUENCE how to read and write arrays (= buffers) from and to disk.

kmruiz
Posts: 7
Joined: Mon Aug 13, 2012 5:36 am
Location: Spain
Contact:

Re: File input stream buffering

Post by kmruiz » Thu Aug 16, 2012 12:51 am

Thank you for the information edgar-rft.

So, as I understood, there isn't any built-in nor automatic mechanism for file buffering and I must make it manually. It makes me come another question: which is the optimal size for a memory buffer? I suppose that 1KiB or 2KiB would be fine, but it's better to calculate the buffer size depending on the file?

Thanks for the help.

Konfusius
Posts: 62
Joined: Fri Jun 10, 2011 6:38 am

Re: File input stream buffering

Post by Konfusius » Thu Aug 16, 2012 4:12 am

kmruiz wrote:there isn't any built-in nor automatic mechanism for file buffering
That ANSI Lisp has no explicit notion of file buffering doesn't mean that there is none. Its a matter of the quality of the implementation.

BTW, if you read a file byte by byte in C you have the same problem although there is explicit buffering. Its not a good idea in general to read a file byte-wise regardless of what language you are using.

kmruiz
Posts: 7
Joined: Mon Aug 13, 2012 5:36 am
Location: Spain
Contact:

Re: File input stream buffering

Post by kmruiz » Thu Aug 16, 2012 6:50 am

When I develop in C (everybody recognized it as low level language) I use a manual buffer (and we have some sort of library for optimal file reading). Buffer size depends on the type of data but we allocate from 80 to 2K elements.

When I develop in Java (a very high level language) I use a BufferedInputStream which buffers automatically the file content, so I can read by byte without problem because it's managed entirely by the class.

I supposed that being Common Lisp a high level language it would manage or have some ANSI mechanism for file buffering without make it from scratch. So it doesn't have. No problem. I think I can do buffering manually without problems.

Thank you Konfusius

vsedach
Posts: 8
Joined: Wed Dec 17, 2008 1:59 pm
Location: Montreal, QC, CA
Contact:

Re: File input stream buffering

Post by vsedach » Sat Sep 01, 2012 8:33 am

I've made some test cases using a Java library (sorry but I can't remember it's name) and for small files the speed and memory usage seems better, but the problem comes when the file is large and I think I loose speed reading by byte.
This is something you should profile then. It might be the case that you lose speed vs the Java library from function call overhead, and it has nothing to do with buffering (HotSpot will eventually inline all the reading code for large files, where for all current CL compilers you have to make inline declarations yourself). Most CL implementations are going to have buffered file reading if for no other reason than their runtime uses libc for portability.

So do some profiling, get the source code for your implementation and see what it does (if you build your main lisp from source, you can use SLIME cross-referencing on system functions just like on any other code), and then you have the data to come up with some improvements either for your code or for the implementation you're using.

Post Reply