Discussion:
read a file from a driver
(too old to reply)
Kreider, Carl
2002-04-03 15:16:14 UTC
Permalink
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.

Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
--
Carl Kreider
Wind River Doctor Design Services
700 E Beardsley Suite 14A
Elkhart Indiana 46514
219-206-8050 x104
***@windriver.com ***@doctordesign.com
***@acm.org ***@gte.net
=============================================================
On two occasions I have been asked [by members of Parliament], 'Pray, Mr.
Babbage, if you put into the machine wrong figures, will the right answers
come out?' I am not able rightly to apprehend the kind of confusion of
ideas that could provoke such a question.
-- Charles Babbage
=============================================================

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Poul-Henning Kamp
2002-04-03 15:38:14 UTC
Permalink
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
Don't even think about it.

At the time your driver is probed/attached, there is no filesystems
mounted yet.

Best suggestion is to use an ioctl to download the data from
userland.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Magnus B{ckstr|m
1970-01-01 00:00:00 UTC
Permalink
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
The way this is usually done is by having the driver implement some
interface (e. g. an ioctl) through which a user-mode utility can
download the data to the hardware. Have the utility run at boot along
with all other boot-time stuff, i. e. /etc/rc*.

(Anecdote: I knew an Ultrix system years ago with a PXG (IIRC)
accelerated graphics adapter that needed a firmware download at boot
time. At one time the firmware file, or the dowload gizmo, somehow
suffered bit rot -- which resulted in some really bizarre glitches in
the graphics rendering...)

Magnus



To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
John Baldwin
2002-04-03 15:44:17 UTC
Permalink
Post by Poul-Henning Kamp
write
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
Don't even think about it.
At the time your driver is probed/attached, there is no filesystems
mounted yet.
Best suggestion is to use an ioctl to download the data from
userland.
Or load the firmware using kldload or from the loader using a type string
similar to the way we do MFS root filesystems.
--
John Baldwin <***@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Julian Elischer
2002-04-03 18:32:35 UTC
Permalink
generally the answer is "You can't do that"

BUT

you could make a loadable module with the firmware,
and load both the module and the driver before booting from
the boot blocks..
then you can unload the firmware module after booting
(or whenever)
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
--
Carl Kreider
Wind River Doctor Design Services
700 E Beardsley Suite 14A
Elkhart Indiana 46514
219-206-8050 x104
=============================================================
On two occasions I have been asked [by members of Parliament], 'Pray, Mr.
Babbage, if you put into the machine wrong figures, will the right answers
come out?' I am not able rightly to apprehend the kind of confusion of
ideas that could provoke such a question.
-- Charles Babbage
=============================================================
with "unsubscribe freebsd-hackers" in the body of the message
To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Terry Lambert
2002-04-03 20:59:28 UTC
Permalink
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
For drivers which must be active in the boot path, it is
generally necessary to embed the firmware in the driver as
data. This is what FreeBSD does for the Adaptec SCSI
drivers.

For drivers that need to be active after boot time, but before
the mi_startup() is complete, you can load the data in a module
that contains the data. THis is similar to compiling the data
into the kernel, but puts it at a minor level of abstraction.

For drivers that only need to be there before the rc files
start executing or the other CPUs have been started, you can do
kernel level file I/O. THis was discussed in some detail over
the past year on the FreeBSD-current mailing list. THis is
not recommended, since FreeBSD has porr support for kernel
level file I/O compared to other OSs (e.g. AIX has excellent
kernel interfaces for almost all system calls, since it puts
its abstraction barriers in the right places).

For most drivers which are not accessed until some way into
the user space bot process, or some time after the system is
up, the general model is to open the driver and to push the
data down into the kernel via an ioctl(). THis is, in fact,
how the LKM system worked to load loadable kernel modules:
it pushed the modules over the user/kernel boundary into
allocated memory, a chunk at a time.

It really depends on *when* in the boot process the driver
*must* be functional and available, as to which approach you
should use.

Using the "kernel file I/O" approach has an incredibly narrow
window of utility, and since it's hard to do anyway... my
recommendation is to pick another option.

-- Terry

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Michael Smith
2002-04-05 18:48:35 UTC
Permalink
Post by Terry Lambert
For drivers which must be active in the boot path, it is
generally necessary to embed the firmware in the driver as
data. This is what FreeBSD does for the Adaptec SCSI
drivers.
For drivers that need to be active after boot time, but before
the mi_startup() is complete, you can load the data in a module
that contains the data. THis is similar to compiling the data
into the kernel, but puts it at a minor level of abstraction.
These two are the same case. See eg. the isp_fw module.
--
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to
the American public. - Theodore Roosevelt



To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
David O'Brien
2002-04-04 01:34:39 UTC
Permalink
Post by John Baldwin
Or load the firmware using kldload or from the loader using a type string
similar to the way we do MFS root filesystems.
Or similar to the ISP (Qlogic SCSI) firmware.

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
M. Warner Losh
2002-04-04 02:58:41 UTC
Permalink
In message: <***@critter.freebsd.dk>
Poul-Henning Kamp <***@critter.freebsd.dk> writes:
: In message <***@indy.doctordesign.com>, "Kreider, Carl" write
: s:
: >
: >I am working on an embedded project running FreeBSD, and my driver
: >for our custom card needs to load an FPGA with code. I know I can
: >compile the code in as data, but for ease of development, I would
: >rather fetch the FPGA code from a file. With a driver in kernel
: >space. Really.
: >
: >Can it be done? If so, how? open() and read() are obviously in libc
: >which rules them out. Do I have to write my own in assembler?
:
: Don't even think about it.
:
: At the time your driver is probed/attached, there is no filesystems
: mounted yet.
:
: Best suggestion is to use an ioctl to download the data from
: userland.

The other alternative that I've seen used is to have a module that
loaded at the same time as the driver that has the firmware. This
allows that second module to, in theory, be unloaded and the memory
reclaimed.

Warner

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Terry Lambert
2002-04-05 23:52:24 UTC
Permalink
Post by Michael Smith
Post by Terry Lambert
For drivers which must be active in the boot path, it is
generally necessary to embed the firmware in the driver as
data. This is what FreeBSD does for the Adaptec SCSI
drivers.
For drivers that need to be active after boot time, but before
the mi_startup() is complete, you can load the data in a module
that contains the data. THis is similar to compiling the data
into the kernel, but puts it at a minor level of abstraction.
These two are the same case. See eg. the isp_fw module.
I made the distinction because I could make up a situation
where they were different, or where you could replace low
performance firmware with higher performance firmware. The
Adaptec firmware that comes from the POST is good enough to
load enough of the OS for the OS driver to take over, and
the OS driver replaces the firmware.

You could also consider the case where the firmware module
was loaded, the firmware shoved down to the card by the
driver, and then the module containing it was unloaded
(recovering the data space). This is a tiny amount of
space for modern systems, but it's a possibility.

You could also think about booting with one set of Tigon II
firmware, replacing it, and then resetting the driver, which
would result in the replacement firmware being shoved down.

There's also the possibility of updating the firmware for
a driver using a module that's loaded at boot time, but
having firmware compiled into it.

These are all really border cases, and the distinction is
really slight, though... just trying to dot the I's and
cross the T's. 8-).


-- Terry

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Brian Somers
2002-04-11 22:22:37 UTC
Permalink
There's an example of how to do this in the ``digi'' driver. It
loads it's firmware module on-the-fly (if it can) and dumps it
afterwards.

As you can see, this saves a bunch of runtime space (digi is the base
driver, digi_* are the firmware modules):

$ ls -l /boot/kernel/digi*
-r-xr-xr-x 1 root wheel 36911 Mar 29 00:48 /boot/kernel/digi.ko
-r-xr-xr-x 1 root wheel 17800 Mar 29 00:48 /boot/kernel/digi_CX.ko
-r-xr-xr-x 1 root wheel 69548 Mar 29 00:48 /boot/kernel/digi_CX_PCI.ko
-r-xr-xr-x 1 root wheel 68764 Mar 29 00:48 /boot/kernel/digi_EPCX.ko
-r-xr-xr-x 1 root wheel 70336 Mar 29 00:48 /boot/kernel/digi_EPCX_PCI.ko
-r-xr-xr-x 1 root wheel 11400 Mar 29 00:48 /boot/kernel/digi_Xe.ko
-r-xr-xr-x 1 root wheel 72852 Mar 29 00:48 /boot/kernel/digi_Xem.ko
-r-xr-xr-x 1 root wheel 73608 Mar 29 00:48 /boot/kernel/digi_Xr.ko

Unfortunately, if you want to load digi from loader.conf, you have to
explicitly load the firmware module(s) too as the module load function
is called before the filesystem is available.

FWIW, Solaris has a crude interface that will allow you to open and
getc/putc a file. It's smart enough to know that it should talk to
the boot prom if roodev isn't yet set. I believe it was required for
Solaris' drivers to be able to read their .conf files at boot time,
and was sufficient to allow access to other files in the software I
was writing.
Post by Julian Elischer
generally the answer is "You can't do that"
BUT
you could make a loadable module with the firmware,
and load both the module and the driver before booting from
the boot blocks..
then you can unload the firmware module after booting
(or whenever)
Post by Kreider, Carl
I am working on an embedded project running FreeBSD, and my driver
for our custom card needs to load an FPGA with code. I know I can
compile the code in as data, but for ease of development, I would
rather fetch the FPGA code from a file. With a driver in kernel
space. Really.
Can it be done? If so, how? open() and read() are obviously in libc
which rules them out. Do I have to write my own in assembler?
--
Carl Kreider
Wind River Doctor Design Services
700 E Beardsley Suite 14A
Elkhart Indiana 46514
219-206-8050 x104
=============================================================
On two occasions I have been asked [by members of Parliament], 'Pray, Mr.
Babbage, if you put into the machine wrong figures, will the right answers
come out?' I am not able rightly to apprehend the kind of confusion of
ideas that could provoke such a question.
-- Charles Babbage
=============================================================
--
Brian <***@freebsd-services.com> <***@Awfulhak.org>
http://www.freebsd-services.com/ <brian@[uk.]FreeBSD.org>
Don't _EVER_ lose your sense of humour ! <brian@[uk.]OpenBSD.org>



To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Danny Braniss
2002-06-28 05:27:48 UTC
Permalink
thanks to all!

what works for me is
panic
and then a <CR>

danny



To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Julian H. Stacey
2003-02-02 19:40:41 UTC
Permalink
[stuff]
Wouldn't it be best to put him in contact with some of the other
developers in the area?
Which brings the question.. who IS in the area..
User group list at http://www.freebsd.org/support.html#user

* Ukraine The Ukrainian FreeBSD User Group (UAFUG) is Russian/Ukrainian
languages oriented user group for the Ukrainian users of BSD-derivatives,
promoting and supporting BSD flavours and open source usage. The
UAFUG has had its first meeting on 2 June 2002 and meets every 2-3
weeks. We also provide an open forum for all BSD-related things in
the Russian and Ukrainian languages (though we can read/write in
English as well). To join the mailing list send a message to
***@FreeBSDDiary.org.ua with subscribe freebsd in the body
of the message. Check the link above for more information.

http://www.uafug.org.ua
All in Russian - with a heap of daemons + blue & yellow horizontal flag.

Julian Stacey
jhs @ berklix.com Computer Systems Engineer, Unix & Net Consultant, Munich.

To Unsubscribe: send mail to ***@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Steve Watt
2004-04-21 20:37:49 UTC
Permalink
On Apr 21, 13:28, Julian Elischer wrote:
} Subject: Re: how to flush out cache.?
}
} On Wed, 21 Apr 2004, Steve Watt wrote:
}
} > In article <Pine.BSF.4.21.0404211219460.31770-***@InterJet.elischer.org> you write:
} > >
} > >Ok so I have an application where I need to
} > >reread a file I have just written to ensure that it went to disk
} > >correctly..
} >
} > What are you hoping to accomplish? There are probably other ways
} > to solve the larger problem.
}
} I thought I was being clear..
} I need to remove all the pages from cache so that a reread of the file
} is forced to go to disk.
} and I don't want to go read a 2GB dummy file to force the flush..

No, my question is "what are you trying to accomplish with the
reread", at a higher level than "I want to know it's on disk". Is
there some reason you have for not trusting the hardware? Are you
trying to do a database commit protocol? Debugging the storage
system?

} Someone suggested that I read the file usign 'dump' through the raw
} device..

Even doing that doesn't necessarily mean the bits have made it onto
the rotating media. There can also be caches in the disk controller,
and/or caches on the drive itself. If you're trying for a case where
you want to pull the power, unmounting and remounting the filesystem
will get it about as close as you can.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Internet: steve @ Watt.COM Whois: SW32
Free time? There's no such thing. It just comes in varying prices...
Julian Elischer
2004-04-21 20:44:16 UTC
Permalink
Post by Steve Watt
} Subject: Re: how to flush out cache.?
}
}
} > >
} > >Ok so I have an application where I need to
} > >reread a file I have just written to ensure that it went to disk
} > >correctly..
} >
} > What are you hoping to accomplish? There are probably other ways
} > to solve the larger problem.
}
} I thought I was being clear..
} I need to remove all the pages from cache so that a reread of the file
} is forced to go to disk.
} and I don't want to go read a 2GB dummy file to force the flush..
No, my question is "what are you trying to accomplish with the
reread", at a higher level than "I want to know it's on disk". Is
there some reason you have for not trusting the hardware? Are you
trying to do a database commit protocol? Debugging the storage
system?
we are getting data curruptions occasionally and we are trying to track
it down..

if we wait a half hour so the cache is flushed out, teh file sometimes
checksums differently and has bad data in it
but by then the original files have gone away
so we have a tough time recreating the data..

This is also to help us figure out where the problem is...
but since we have seen this several times we'd like to add a "check that
data on disk" option to our application, to help track this down
in the future if it appears to be happenning again.

if it's happenned several times it can happen again.
Post by Steve Watt
} Someone suggested that I read the file usign 'dump' through the raw
} device..
Even doing that doesn't necessarily mean the bits have made it onto
the rotating media. There can also be caches in the disk controller,
and/or caches on the drive itself. If you're trying for a case where
you want to pull the power, unmounting and remounting the filesystem
will get it about as close as you can.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Free time? There's no such thing. It just comes in varying prices...
Julian Elischer
2004-04-21 20:46:04 UTC
Permalink
Post by Steve Watt
} Subject: Re: how to flush out cache.?
}
}
} > >
} > >Ok so I have an application where I need to
} > >reread a file I have just written to ensure that it went to disk
} > >correctly..
} >
} > What are you hoping to accomplish? There are probably other ways
} > to solve the larger problem.
}
} I thought I was being clear..
} I need to remove all the pages from cache so that a reread of the file
} is forced to go to disk.
} and I don't want to go read a 2GB dummy file to force the flush..
No, my question is "what are you trying to accomplish with the
reread", at a higher level than "I want to know it's on disk". Is
there some reason you have for not trusting the hardware? Are you
trying to do a database commit protocol? Debugging the storage
system?
} Someone suggested that I read the file usign 'dump' through the raw
} device..
Even doing that doesn't necessarily mean the bits have made it onto
the rotating media. There can also be caches in the disk controller,
and/or caches on the drive itself. If you're trying for a case where
you want to pull the power, unmounting and remounting the filesystem
will get it about as close as you can.
The disk caches are small enough.. we are talking about multi gigabyte
files getting a few blocks bad somewhere in the middle.
(and yes the machines have enough RAM to cache the files).
Post by Steve Watt
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Free time? There's no such thing. It just comes in varying prices...
Stephan Uphoff
2004-04-21 21:21:58 UTC
Permalink
mmap() and msync(..MS_INVALIDATE..) should work.

Stephan
Post by Steve Watt
Ok so I have an application where I need to
reread a file I have just written to ensure that it went to disk
correctly..
Other than reading a few GB of data, is there a way to flush
out the cache copy of a file I've written?
a file flag saying "don't keep a copy after it's written to disk"?
a syscall discard_cached_blocks(fd);
?
any other suggestions?
julian
(BTW this would be for 4.x initially)
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Julian Elischer
2004-04-21 21:53:26 UTC
Permalink
Post by Stephan Uphoff
mmap() and msync(..MS_INVALIDATE..) should work.
hmmm that is rather interesting..
I wonder if it would work....
Maybe a vm guru could confirm this.. (under 4.x)
Post by Stephan Uphoff
Stephan
Post by Steve Watt
Ok so I have an application where I need to
reread a file I have just written to ensure that it went to disk
correctly..
Other than reading a few GB of data, is there a way to flush
out the cache copy of a file I've written?
a file flag saying "don't keep a copy after it's written to disk"?
a syscall discard_cached_blocks(fd);
?
any other suggestions?
julian
(BTW this would be for 4.x initially)
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Matthew Dillon
2004-04-22 01:02:21 UTC
Permalink
:
:>
:> mmap() and msync(..MS_INVALIDATE..) should work.
:
:hmmm that is rather interesting..
:I wonder if it would work....
:Maybe a vm guru could confirm this.. (under 4.x)
:

Huh. If I hadn't looked at the code I would have said that
MS_INVALIDATE doesn't work in FreeBSD, but when I look at the code
it sure looks like it ought to work!

But, alas, it does not. The invalidation request goes all the way
through to the vnode pager but it looks like the vnode pager ignores
it.

MS_INVALIDATE -> OBJPC_INVAL -> VM_PAGER_PUT_INVAL -> IO_INVAL -> (ignored)

IO_INVAL is defined to be 'invalidate after I/O completes',
not 'throw away the dirty data', but the only place it appears to be
implemented is in the NFS code.

-Matt
Julian Elischer
2004-04-22 01:14:04 UTC
Permalink
Post by Matthew Dillon
:>
:> mmap() and msync(..MS_INVALIDATE..) should work.
:hmmm that is rather interesting..
:I wonder if it would work....
:Maybe a vm guru could confirm this.. (under 4.x)
Huh. If I hadn't looked at the code I would have said that
MS_INVALIDATE doesn't work in FreeBSD, but when I look at the code
it sure looks like it ought to work!
But, alas, it does not. The invalidation request goes all the way
through to the vnode pager but it looks like the vnode pager ignores
it.
MS_INVALIDATE -> OBJPC_INVAL -> VM_PAGER_PUT_INVAL -> IO_INVAL -> (ignored)
IO_INVAL is defined to be 'invalidate after I/O completes',
not 'throw away the dirty data', but the only place it appears to be
implemented is in the NFS code.
Actually what I'm looking for is
"throw away clean data"

I want to dump the cached version of a file so that I can force a reread
of the disk.
Post by Matthew Dillon
-Matt
Matthew Dillon
2004-04-22 01:40:09 UTC
Permalink
:Actually what I'm looking for is
:"throw away clean data"
:
:I want to dump the cached version of a file so that I can force a reread
:of the disk.

MS_INVALIDATE doesn't do that.

madvise()'s MADV_FREE does what you want, BUT it does not currently
work (at least on 4.x or in DFly) on file-backed data, it only works
with anonymous memory. I believe that on some systems MADV_FREE does
what you expect, e.g. like on Solaris (though I am not 100% sure), so
it would not be far-fetched to go and implement it.

-Matt
Matthew Dillon
<***@backplane.com>
Stephan Uphoff
2004-04-23 23:45:37 UTC
Permalink
with this bug could a user zero out /etc/group or similar?
I am not sure what the ramification of the bug is..
The bug affects only in memory modified file data.

In memory modifications to the file can be deleted
and the file data reverts to a state before the
file modification. (Not unlike a crash/power failure )

The worst security scenario I can think of is the possibility
to revert a file to uninitialized disk data blocks or to
prevent the update of a file.

Stephan
Matthew Dillon
2004-04-24 01:37:04 UTC
Permalink
:When I get the time (probably not next week) I will write a patch
:to release the cached buffers that would prevent page removal.

I would appreciate a CC if/when you have something along these lines.
It won't be easy. The VM system has no reliable way to determine the
buffer cache block size for a VM object or VNODE, nor any idea how to
deal with the buffer state which can vary in subtle ways between VFS's
(e.g. NFS vs UFS). So a new VOP call would probably have to be created
to clean out the buffers associated with a memory range.

We might want to create such a call anyway in order to support ranged
fsync()'s.

:The bug affects only in memory modified file data.
:
:In memory modifications to the file can be deleted
:and the file data reverts to a state before the
:file modification. (Not unlike a crash/power failure )
:
:The worst security scenario I can think of is the possibility
:to revert a file to uninitialized disk data blocks or to
:prevent the update of a file.
:
: Stephan

I won't say that it's imposible, but it would sure be hard to accomplish
since access to uninitialized disk data blocks is going to be governed
by the buffer cache, and the buffer is cleared unconditionally when
balloc'd inside ffs_write() (and the blocks will not be assigned if one
tries to do a read() if a file hole).

The only other code that calls VOP_BALLOC() for a file block is
ftruncate(), and it also does an unconditional write.

-Matt
Matthew Dillon
<***@backplane.com>
Steve Watt
2007-01-02 08:06:45 UTC
Permalink
On Jan 1, 23:56, Julian Elischer wrote:
} Subject: Re: Interesting TCP issue
} Steve Watt wrote:
} > One of my users is having trouble receiving mail from Skype. So,
} > after some sniffing, I discovered this:
} >
} > # tcpdump -vv -s 1500 -i dc0 -X net 213.244.128.0/18
} > tcpdump: lestening on dc0, link-type EN10MB (Ethernet), capture size 1500 bytes
} > 13:18:13.607493 IP (tos 0x20, ttl 58, id 12896, offset 0, flags [DF], proto: TCP (6), length: 74) share.skype.net.50406 > wattres.watt.com.smtp: P, cksum 0x9297 (correct), 4072464914:4072464936(22) ack 1248591103 win 46 <nop,nop,timestamp 2511885672 520058954>
[ sneck ]
} >
} > And no responses from my system.
} >
} > Interesting. I presume it has something to do with the
} > idiotically small window the remote server is advertising. So I
} > set net.inet.tcp.minmss down to 46, and that resulted in a RST
} > being spit back to skype's server when its retransmit happened.
} > [...]
}
} turn off window scaling (I forget the sysctl) and see if that helps
} It's broken in some versions of freeBSD at least.

Duh, should've mentioned the version:

FreeBSD wattres.Watt.COM 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #6: Tue Dec 26 11:46:36 PST 2006 ***@wattres.Watt.COM:/usr/obj/usr/src/sys/WATTRES i386

I did the cvsup just before the build time above.

I just turned off net.inet.tcp.rfc1323; we'll see if that helps on the
next polling attempt by skype's server.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...
Steve Watt
2007-01-02 08:25:22 UTC
Permalink
On Jan 2, 0:06, Steve Watt wrote:
} Subject: Re: Interesting TCP issue
} On Jan 1, 23:56, Julian Elischer wrote:
} } Subject: Re: Interesting TCP issue
} } Steve Watt wrote:
} } > One of my users is having trouble receiving mail from Skype. So,
} } > after some sniffing, I discovered this:
} } >
} } > # tcpdump -vv -s 1500 -i dc0 -X net 213.244.128.0/18
} } > tcpdump: lestening on dc0, link-type EN10MB (Ethernet), capture size 1500 bytes
} } > 13:18:13.607493 IP (tos 0x20, ttl 58, id 12896, offset 0, flags [DF], proto: TCP (6), length: 74) share.skype.net.50406 > wattres.watt.com.smtp: P, cksum 0x9297 (correct), 4072464914:4072464936(22) ack 1248591103 win 46 <nop,nop,timestamp 2511885672 520058954>
} [ sneck ]
} } >
} } > And no responses from my system.
} } >
} } > Interesting. I presume it has something to do with the
} } > idiotically small window the remote server is advertising. So I
} } > set net.inet.tcp.minmss down to 46, and that resulted in a RST
} } > being spit back to skype's server when its retransmit happened.
} } > [...]
} }
} } turn off window scaling (I forget the sysctl) and see if that helps
} } It's broken in some versions of freeBSD at least.
}
} Duh, should've mentioned the version:
}
} FreeBSD wattres.Watt.COM 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #6: Tue Dec 26 11:46:36 PST 2006 ***@wattres.Watt.COM:/usr/obj/usr/src/sys/WATTRES i386
}
} I did the cvsup just before the build time above.
}
} I just turned off net.inet.tcp.rfc1323; we'll see if that helps on the
} next polling attempt by skype's server.

We have a winner -- setting net.inet.tcp.rfc1323=0 let the mail message
come in on the next try.

p0f's guess at the remote machine is that it's a newer Linux 2.6 box;
that doesn't seem like an interoperability problem that should've
slipped through -BETA, given how common those are...

The exchange with rfc1323 looks completely normal, with the remote end
giving windows of 5840. But it ended the conversation in a very Windows
way by sending a couple of RSTs after the FIN exchange. Or has that
brokenness now extended itself to Linux as well?
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...
Julian Elischer
2007-01-02 21:40:17 UTC
Permalink
Post by Steve Watt
} Subject: Re: Interesting TCP issue
} } Subject: Re: Interesting TCP issue
} } > One of my users is having trouble receiving mail from Skype. So,
} } >
} } > # tcpdump -vv -s 1500 -i dc0 -X net 213.244.128.0/18
} } > tcpdump: lestening on dc0, link-type EN10MB (Ethernet), capture size 1500 bytes
} } > 13:18:13.607493 IP (tos 0x20, ttl 58, id 12896, offset 0, flags [DF], proto: TCP (6), length: 74) share.skype.net.50406 > wattres.watt.com.smtp: P, cksum 0x9297 (correct), 4072464914:4072464936(22) ack 1248591103 win 46 <nop,nop,timestamp 2511885672 520058954>
} [ sneck ]
} } >
} } > And no responses from my system.
} } >
} } > Interesting. I presume it has something to do with the
} } > idiotically small window the remote server is advertising. So I
} } > set net.inet.tcp.minmss down to 46, and that resulted in a RST
} } > being spit back to skype's server when its retransmit happened.
} } > [...]
} }
} } turn off window scaling (I forget the sysctl) and see if that helps
} } It's broken in some versions of freeBSD at least.
}
}
}
} I did the cvsup just before the build time above.
}
} I just turned off net.inet.tcp.rfc1323; we'll see if that helps on the
} next polling attempt by skype's server.
We have a winner -- setting net.inet.tcp.rfc1323=0 let the mail message
come in on the next try.
p0f's guess at the remote machine is that it's a newer Linux 2.6 box;
that doesn't seem like an interoperability problem that should've
slipped through -BETA, given how common those are...
The exchange with rfc1323 looks completely normal, with the remote end
giving windows of 5840. But it ended the conversation in a very Windows
way by sending a couple of RSTs after the FIN exchange. Or has that
brokenness now extended itself to Linux as well?
we have seen this since 4.x
I think a fix may be in 7.0 but I'm not sure..
I thin kthere is a problem when the far end sets the window down to 1
but scales it by a factor of 2^{big number}.

the FreeBSD tcp code aparently scales the wrong variable
leading it to ahve a 1 byte window.


Andre, can you check out this problem and MFC the correct fix
if it is indeed the same problem in 6.2?

I enclose part of our trouble ticket info. on the topic.

---------------------- Problem Description -----------------------

The TCP window scaling (rfc 1323) implementation of
FreeBSD is broken such that we initially under-estimate
the TCP receive window size of the remote server.

Depending on the scaling options requested by the remote server,
this can cause us to send our SMTP banner spanning more than a
single TCP packet.

In at least one customer case, the remote MTA doesn't like what
appears to be a truncated SMTP banner and immediately RST's the
connection (instead of ACK'ing the first packet and receiving
the rest of the banner in the next packet).

The fact that the remote server won't accept our SMTP banner in
more than one packet is a problem on their end. But the fact
that we break up the banner in the first place is due to a bug
in FreeBSD related to TCP window scaling.

See AE ticket # 78590.

here's what happens:

- In the originating TCP SYN packet, the remote side says "we want
to use a window scaling factor of 9" (ie: multiply their TCP window
size values by 2**9=512 to calculate the real window size they
are able to accept.)
- We ACK this SYN, saying that we support their scaling
request, but we don't need any window scaling done. (We
send TCP option window scale = 0.
nothing wrong here yet)
- They send us an ACK with a window size of 12 (by which they
really mean 12*512=6144)
- We seem to forget about the scaling and think they can only handle
a 12 byte TCP window, so we send only the first 12 bytes of our SMTP
banner in the first packet.
- We intend to send the rest of the banner when they ACK the
previous packet, but they are too impatient and hang up the
phone without ACK'ing our partial banner so we can send the rest.

This problem likely affects all cases in which window scaling is
requested in the TCP options, but is usually not noticed because it just
causes us to send smaller packets than we could. In most instances, this
only affects theoretical peak throughtput (since we could conceivably be
sending more data before requiring an ACK)

a possible workaround is to use 'sysctl' to completely disable our support
for rfc1323. for example,

snooper:service 37] sysctl net.inet.tcp.rfc1323=0
net.inet.tcp.rfc1323: 1 -> 0

This SHOULD cause us to not repond to the TCP window scaling option
at all, thus telling the remote server that we are not supporting
it (which is better than saying we do support it, then getting
it wrong).

Pending customer feedback to see if this workaround addresses the
issue, then we might make the change permanent in /etc/sysctl.conf.
Also pending information about what MTA software and OS is running
on the remote server.

Evan did a little digging and noticed that TCP window scaling
implementation is just being addressed in FreeBSD HEAD.
see:
http://cvs.ironport.com/cgi-bin/viewcvs.cgi/freebsd/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85

It could well be that window scaling works fine AFTER the
initial connection is made, but in at least this one case,
it is affecting customer's ability to receive email from
certain domains...

We have info on reproducing the problem by crafting TCP SYN packets
with window scaling options.
Steve Watt
2007-01-20 09:18:42 UTC
Permalink
In <***@elischer.org>, Julian Elischer wrote:

[ Snip discussion of symptoms of window scaling broken when
talking to at least the skype mail servers. ]
Post by Julian Elischer
we have seen this since 4.x
I think a fix may be in 7.0 but I'm not sure..
I thin kthere is a problem when the far end sets the window down to 1
but scales it by a factor of 2^{big number}.
Andre, can you check out this problem and MFC the correct fix
if it is indeed the same problem in 6.2?
It is the same problem; I took the (one-line) fix as indicated by
Post by Julian Elischer
http://cvs.ironport.com/cgi-bin/viewcvs.cgi/freebsd/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85
(well, not cvs.ironport.com, which doesn't seem to exist at the moment),
and applied the diff from 1.84 to
1.85 and to a 6.2-PRERELEASE box updated around 25 Dec 06.
It works like a charm.

I would vote to MFC 1.85 now that 6.2 is out.

The diff, for those who are following along at home, is:

===================================================================
RCS file: /usr/local/www/cvsroot/FreeBSD/src/sys/netinet/tcp_syncache.c,v
retrieving revision 1.84
retrieving revision 1.85
diff -u -p -r1.84 -r1.85
--- src/sys/netinet/tcp_syncache.c 2006/02/09 21:29:02 1.84
+++ src/sys/netinet/tcp_syncache.c 2006/02/28 23:05:59 1.85
@@ -682,7 +682,7 @@ syncache_socket(sc, lso, m)
tp->t_flags |= TF_NOOPT;
if (sc->sc_flags & SCF_WINSCALE) {
tp->t_flags |= TF_REQ_SCALE|TF_RCVD_SCALE;
- tp->requested_s_scale = sc->sc_requested_s_scale;
+ tp->snd_scale = sc->sc_requested_s_scale;
tp->request_r_scale = sc->sc_request_r_scale;
}
if (sc->sc_flags & SCF_TIMESTAMP) {
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...
Uwe Doering
2007-01-22 08:15:13 UTC
Permalink
Post by Steve Watt
[ Snip discussion of symptoms of window scaling broken when
talking to at least the skype mail servers. ]
Post by Julian Elischer
we have seen this since 4.x
I think a fix may be in 7.0 but I'm not sure..
I thin kthere is a problem when the far end sets the window down to 1
but scales it by a factor of 2^{big number}.
Andre, can you check out this problem and MFC the correct fix
if it is indeed the same problem in 6.2?
It is the same problem; I took the (one-line) fix as indicated by
Post by Julian Elischer
http://cvs.ironport.com/cgi-bin/viewcvs.cgi/freebsd/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85
(well, not cvs.ironport.com, which doesn't seem to exist at the moment),
and applied the diff from 1.84 to
1.85 and to a 6.2-PRERELEASE box updated around 25 Dec 06.
It works like a charm.
I would vote to MFC 1.85 now that 6.2 is out.
===================================================================
RCS file: /usr/local/www/cvsroot/FreeBSD/src/sys/netinet/tcp_syncache.c,v
retrieving revision 1.84
retrieving revision 1.85
diff -u -p -r1.84 -r1.85
--- src/sys/netinet/tcp_syncache.c 2006/02/09 21:29:02 1.84
+++ src/sys/netinet/tcp_syncache.c 2006/02/28 23:05:59 1.85
@@ -682,7 +682,7 @@ syncache_socket(sc, lso, m)
tp->t_flags |= TF_NOOPT;
if (sc->sc_flags & SCF_WINSCALE) {
tp->t_flags |= TF_REQ_SCALE|TF_RCVD_SCALE;
- tp->requested_s_scale = sc->sc_requested_s_scale;
+ tp->snd_scale = sc->sc_requested_s_scale;
tp->request_r_scale = sc->sc_request_r_scale;
}
if (sc->sc_flags & SCF_TIMESTAMP) {
I wonder whether it is that easy. As far as I can tell the commit to
HEAD actually comprised changes to three files:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_input.c.diff?r1=1.290&r2=1.291
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_var.h.diff?r1=1.127&r2=1.128

How about the modifications in 'tcp_input.c'? Are they relevant to the
problem this thread is about? If so, assessing the correctness of an
MFC might prove to be a little harder.

Uwe
--
Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers
***@geminix.org | http://www.escapebox.net
Steve Watt
2007-01-22 15:46:02 UTC
Permalink
On Jan 22, 9:15, Uwe Doering wrote:
} Subject: Re: Interesting TCP issue
} Steve Watt wrote:
} > In <***@elischer.org>, Julian Elischer wrote:
} >
} > [ Snip discussion of symptoms of window scaling broken when
} > talking to at least the skype mail servers. ]
} >
} >> we have seen this since 4.x
} >> I think a fix may be in 7.0 but I'm not sure..
} >> I thin kthere is a problem when the far end sets the window down to 1
} >> but scales it by a factor of 2^{big number}.
} >>
} >> Andre, can you check out this problem and MFC the correct fix
} >> if it is indeed the same problem in 6.2?
} >
} > It is the same problem; I took the (one-line) fix as indicated by
} >
} >> http://cvs.ironport.com/cgi-bin/viewcvs.cgi/freebsd/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85
} >
} > (well, not cvs.ironport.com, which doesn't seem to exist at the moment),
} > and applied the diff from 1.84 to
} > 1.85 and to a 6.2-PRERELEASE box updated around 25 Dec 06.
} > It works like a charm.
} >
} > I would vote to MFC 1.85 now that 6.2 is out.
}
} I wonder whether it is that easy. As far as I can tell the commit to
} HEAD actually comprised changes to three files:

I wonder as well, but that single diff fixes the problem I was running
into with the skype mail servers.

} http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_input.c.diff?r1=1.290&r2=1.291
} http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_syncache.c.diff?r1=1.84&r2=1.85
} http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_var.h.diff?r1=1.127&r2=1.128
}
} How about the modifications in 'tcp_input.c'? Are they relevant to the
} problem this thread is about? If so, assessing the correctness of an
} MFC might prove to be a little harder.

Looking at it, yeah, those probably need to be picked up in some form as
well. I didn't look closely at the tcpdump after, only observing that
it worked where it didn't before.

Hmm.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...
Continue reading on narkive:
Loading...