Turned out to be a driver issue
In a previous post I mentioned that I was having problems kickstarting one of the appliances. Nothing can be more perplexing than something that was working fine, and then it stops working for no apparent reason! This must have been a driver issue; I either migrated the old kickstart environment incorrectly, or something in the hardware specs had changed and caused the kickstart to fail.
I found an error on the console during the installation (press alt+F2) that explained the problem, Kickstart could not find or load a network driver so it kicked out to a regular installation prompting me for a source for the install.
I tracked another older unit similar to this one and confirmed that the hardware had indeed changed, we had changed the motherboard, which comes with a different network device; we never got around to updating the drivers. So I had to build a new driver, in the same manner that my buddy Steve built a driver for this very same situation.
This applies to these Realtek Network cards: RTL8111B, RTL8168B, RTL8111, RTL8168, RTL8111C, RTL8111CP, RTL8111D(L), RTL8168C, RTL8111DP. I need a driver for this kernel: 2.6.9-42, running on CentOS 4.4
I got the drivers here: (http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false)
Then I got another appliance ready ((You really could use any Linux environment you have available, but I think its easier to have the same kernel running on the system when you compile it.)) and installed some of the necessary utilities to compile a new driver; gcc, make and anything else Yum got. Later on I found I needed the kernel-devel package as well.
I got the driver archive directly with wget by getting the URL from Download them all in Firefox ((Neat trick if you get those pesky javascript void redirects, and you can't get the url.)) to avoid getting the driver then uploading it to the appliance.
From here, follow through the read me that is in the file and I should end up with the driver that I need.
I got this error on the first command that the readme tells me to run.
/lib/modules/2.6.9-42.EL/build: No such file or directory. Stop.
I had to install the kernel-devel package for that kernel version, don't use Yum because it will get the latest and it will not necessarily get the right one, your best option is to just install it from the original source ((We have an internal webserver with all the sources. Notice the 10.10.18.10 address corresponds to this internal server, you'll have to find your own if you're working through this. You can also use the rpms from the disk or wherever else you have them)) with:
rpm -ivh http://your-source-url/centos/x86/4.4/CentOS/RPMS/kernel-devel-2.6.9-42.EL.i686.rpm
Then I was able to build and continue with the instructions in the readme file.
Terminal log:
---------------------
[root@appliance r8168-8.012.00]# rpm -ql kernel-devel| grep modules
/lib/modules/2.6.9-78.0.22.EL/build
/lib/modules/2.6.9-78.0.22.EL/source
/usr/src/kernels/2.6.9-78.0.22.EL-i686/include/config/modules.h
[root@appliance r8168-8.012.00]# uname -a
Linux appliance 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux
[root@appliance r8168-8.012.00]# yum -y remove kernel-devel
Setting up Remove Process
Resolving Dependencies
--> Populating transaction set with selected packages. Please wait.
---> Package kernel-devel.i686 0:2.6.9-78.0.22.EL set to be erased
--> Running transaction check
Dependencies Resolved
=============================================================================
Package Arch Version Repository Size
=============================================================================
Removing:
kernel-devel i686 2.6.9-78.0.22.EL installed 12 M
Transaction Summary
=============================================================================
Install 0 Package(s)
Update 0 Package(s)
Remove 1 Package(s)
Total download size: 0
Downloading Packages:
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Removing : kernel-devel ######################### [1/1]
Removed: kernel-devel.i686 0:2.6.9-78.0.22.EL
Complete!
[root@appliance r8168-8.012.00]# uname -a
Linux appliance 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux
[root@appliance r8168-8.012.00]# rpm -Ivh
-Ivh: unknown option
[root@appliance r8168-8.012.00]# rpm -ivh
Retrieving
Preparing... ########################################### [100%]
1:kernel-devel ######################################### [100%]
[root@appliance r8168-8.012.00]# make clean modulesmake -C src/ clean
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.################################################################################
order
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
make -C src/ modules
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
make -C /lib/modules/2.6.9-42.EL/build SUBDIRS=/root/tmpdriver/r8168-8.012.00/src modules
make[2]: Entering directory `/usr/src/kernels/2.6.9-42.EL-i686'
CC [M] /root/tmpdriver/r8168-8.012.00/src/r8168_n.o
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:174: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:175: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:176: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c: In function `rtl8168_tx_clear':
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:5005: warning: unused variable `dev'
CC [M] /root/tmpdriver/r8168-8.012.00/src/r8168_asf.o
/root/tmpdriver/r8168-8.012.00/src/r8168_asf.c: In function `rtl8168_asf_time_period':
/root/tmpdriver/r8168-8.012.00/src/r8168_asf.c:313: warning: 'pos' might be used uninitialized in this function
LD [M] /root/tmpdriver/r8168-8.012.00/src/r8168.o
Building modules, stage 2.
MODPOST
CC /root/tmpdriver/r8168-8.012.00/src/r8168.mod.o
LD [M] /root/tmpdriver/r8168-8.012.00/src/r8168.ko
make[2]: Leaving directory `/usr/src/kernels/2.6.9-42.EL-i686'
strip --strip-debug r8168.ko
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]# make install
make -C src/ install
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
install -m 744 -c r8168.ko /lib/modules/2.6.9-42.EL/kernel/drivers/net/
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
[root@appliance r8168-8.012.00]# ls src/
Makefile r8168_asf.c r8168_asf.o r8168.ko r8168.mod.o r8168_n.o
Makefile_linux24x r8168_asf.h r8168.h r8168.mod.c r8168_n.c r8168.o
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]# mv src/r8168.ko r8168.ko.2.6.9-42.EL
[root@appliance r8168-8.012.00]# make clean
make -C src/ clean
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.order
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
[root@appliance r8168-8.012.00]# cp src/Makefile src/Makefile.orig
[root@appliance r8168-8.012.00]# vi src/Makefile
[root@appliance r8168-8.012.00]# make
make -C src/ clean
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.order
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
make -C src/ modules
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
make -C /lib/modules/2.6.9-42.ELsmp/build SUBDIRS=/root/tmpdriver/r8168-8.012.00/src modules
make: *** /lib/modules/2.6.9-42.ELsmp/build: No such file or directory. Stop.
make: Entering an unknown directorymake: Leaving an unknown directorymake[1]: *** [modules] Error 2
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
make: *** [modules] Error 2
[root@appliance r8168-8.012.00]# rpm -ivh
Retrieving
Preparing... ########################################### [100%]
1:kernel-smp-devel ###################################### [100%]
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]#
[root@appliance r8168-8.012.00]# makemake -C src/ clean
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.order
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
make -C src/ modules
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
make -C /lib/modules/2.6.9-42.ELsmp/build SUBDIRS=/root/tmpdriver/r8168-8.012.00/src modules
make[2]: Entering directory `/usr/src/kernels/2.6.9-42.EL-smp-i686'
CC [M] /root/tmpdriver/r8168-8.012.00/src/r8168_n.o
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:174: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:175: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:176: warning: `MODULE_PARM_' is deprecated (declared at include/linux/module.h:552)
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c: In function `rtl8168_tx_clear':
/root/tmpdriver/r8168-8.012.00/src/r8168_n.c:5005: warning: unused variable `dev'
CC [M] /root/tmpdriver/r8168-8.012.00/src/r8168_asf.o
/root/tmpdriver/r8168-8.012.00/src/r8168_asf.c: In function `rtl8168_asf_time_period':
/root/tmpdriver/r8168-8.012.00/src/r8168_asf.c:313: warning: 'pos' might be used uninitialized in this function
LD [M] /root/tmpdriver/r8168-8.012.00/src/r8168.o
Building modules, stage 2.
MODPOST
CC /root/tmpdriver/r8168-8.012.00/src/r8168.mod.o
LD [M] /root/tmpdriver/r8168-8.012.00/src/r8168.ko
make[2]: Leaving directory `/usr/src/kernels/2.6.9-42.EL-smp-i686'
strip --strip-debug r8168.ko
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
make -C src/ install
make[1]: Entering directory `/root/tmpdriver/r8168-8.012.00/src'
install -m 744 -c r8168.ko /lib/modules/2.6.9-42.ELsmp/kernel/drivers/net/
make[1]: Leaving directory `/root/tmpdriver/r8168-8.012.00/src'
[root@appliance r8168-8.012.00]# ls src/
Makefile Makefile.orig r8168_asf.h r8168.h r8168.mod.c r8168_n.c r8168.o
Makefile_linux24x r8168_asf.c r8168_asf.o r8168.ko r8168.mod.o r8168_n.o
[root@appliance r8168-8.012.00]# mv src/r8168.ko r8168.ko.2.6.9-42.ELsmp
[root@appliance r8168-8.012.00]# ls -l
total 172
-rw-r--r-- 1 root root 1789 Mar 25 08:07 Makefile
-rw-r--r-- 1 root root 65068 Jun 2 17:57 r8168.ko.2.6.9-42.EL
-rw-r--r-- 1 root root 64168 Jun 2 18:06 r8168.ko.2.6.9-42.ELsmp
-rw-r--r-- 1 root root 4425 Jan 6 10:33 readme
drwxr-xr-x 3 root root 4096 Jun 2 18:08 src
------------
Now onto inserting this to our initrd image... or do we need a driver disk? Steve's article calls to update the initrd image, but I wasn't sure which initrd file I would have to update, in my case, my Kickstart file calls for a driver disk which contains the driver for another network card that we had to build. I think this is where it would go. I'm a little confused on my own train of thought so scratch that last part off, the only thing I need to remember is that I had to modify the initrd.img file that resides on the tftpboot directory.
As it is outlined in the article, the driverdisk works for any driver that isn't a network driver; the reason this doesn't work for network drivers is that Anaconda needs the driver loaded so it can get networking services to get the kickstart file and that stuff started, thus you need to load it before Anaconda kicks in. This is what makes it required to be part of the initrd image. I found a slight discrepancy that made me nervous and caused me to retrace all my steps; it was that the arcticle calls you to copy both of the drivers built; smp and non-smp into their respective directories under the loop-mounted image but this doesn't seem to work because the initrd.img file doesn't have the ELsmp directory there. I skipped this part, hoping that the kickstart would just work and then I can simply copy in the right drivers on the post-install and be done with this. When I talked to Steve about it it seems that this was just an editorial mistake and you don't in fact need to do both drivers for the kickstart, In this case which is technically the same project we don't need the ELsmp working at kickstart time. YMMV on this one.
After I repackaged the initrd.img file and replaced it on the /tftpboot directory then I was able to continue on with the kickstart installation. I ran into the problem that Ruiz mentioned, despite the fact that I had the driver set to be added to the system during the kickstart.
So now you can replace your initrd.img with the one you just created. The kickstart should work fine now, but upon reboot, the system will not be able to find the right driver. After the kickstart, you need to copy over the .ko files to the appropriate directories - we added a line in our post-install script to do this for us, it simply copies the .ko file to the appropriate directory (/lib/modules/`uname -r`/kernel/drivers/net/)
I think the failure was in the way in which I added the driver in. We have a post-install script that basically "tars" a bunch of files into the newly built system. In this system, for example if you put a file in the ../files/var/ directory, then that file will live in the new system under /var, ../files/root then would be /root; do I make sense? Well I put the drivers into ../files/lib/modules/...../net/ for both EL and ELsmp. After the system was kickstarted, I found that the r8168.ko files were in place, but the drivers didn't actually load. Why?
After I did insmod r8168.ko and restarted network services then the driver worked great. So how do I get that to happen automatically? It seems that maybe I'll need to update the driverdisk.img as well as the initrd.img. Sheesh, a full day worth of work plus then some just for a driver update. If it helps anybody, I'll have this available up here as soon as I am finished with it so you don't have to go through the trouble of making it yourself. What a pain! I'll have to continue this on the next post when I get back on this task tomorrow.
To finish:
- automatically load the driver on the right appliance. How to do this?
- Does it matter if you run insmod /lib/modules/*/* or something like that? What happens if you just load insmod at first boot? or during our appliance configuration final script?