Troubleshooting Sun Fire Boot Issues

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น • 61

  • @bradleystannard7875
    @bradleystannard7875 ปีที่แล้ว +30

    Crazy this channel doesnt have 10x the amount of subscribers. Great content as always

  • @JMassengill
    @JMassengill ปีที่แล้ว +22

    I'm a retired IT Network/Firewall tech. I was never in such a hi tech orgniazation that ran Sun or anything close to it. (I mainly worked for Government all my life, money is always tight in IT budgets) This video series was an education. Explained well, no overbearing music, you always pause the video for the viewer when you're just babysitting the install or config changes. Nicely done!

    • @clabretro
      @clabretro  ปีที่แล้ว +4

      thank you!

    • @richardred15
      @richardred15 11 หลายเดือนก่อน

      Hi Level 👋

  • @theserialport
    @theserialport ปีที่แล้ว +11

    Wooah 1,024MB of RAM, what a powerhouse! Great analysis and deep dive on working with LOM. Perhaps we'll get one of these newer Sun machines with LOM running here

    • @clabretro
      @clabretro  ปีที่แล้ว +3

      Ha, oh yeah she's a ripper. These Sun Fires are fun to work on, and the Netra variants were big in the telecommunications industry, which could be interesting.

  • @Codeaholic1
    @Codeaholic1 ปีที่แล้ว +9

    You're conflating runlevels which are specific to the OS and the various services run by the init SMF service management facility with the rest of the system management machinery. The openboot prompt is firmware for managing and configuring the system. Think of it like EFI or your pc bios. The ALOM is literally another computer in your computer for controlling your computer. Runlevels only apply to the init.
    Also c0t0d0s0, controller 0, target 0, disk 0, slice 0. Slices are like partitions but written in the Sun disk label format.
    From an old Solaris admin, happy to see this stuff getting interest. Cheers

    • @clabretro
      @clabretro  ปีที่แล้ว +2

      yeah shouldn't have said run levels, you can see I was learning as I went 😂
      thanks for watching!

    • @Codeaholic1
      @Codeaholic1 ปีที่แล้ว +1

      @@clabretro no worries. Glad to see this stuff shown off. Keep it up!

  • @WooShell
    @WooShell ปีที่แล้ว +7

    Glad to see you finally got it booted. If that CPU has really been run without a heatsink, I'm not surprised it was fried.. the Jalapeno cores indeed were some hot running beasts.
    Sadly you're on the other side of the world.. I've got so many parts leftover from Sun's purple period that I would have gladly provided you with.
    Since the NVRAM is actually stored on the SCC card, getting rid of the ALOM password should have just been a matter of removing the card.. it would have complained about a lack of configuration and probably come up with an invalid (00:00...) MAC address, but it should have allowed you to log into the ALOM with default passwords. Then put the card back and use the user admin commands to reset the password. Officially, it's "not supported" to remove or insert the card with power turned on, but it usually works. (I heard that Sun patched that trick in a later version of ALOM, which simply reboots when the card is moved, but I haven't seen that live, none of my machines has a new enough ALOM version on it.)

  • @McCavity2
    @McCavity2 ปีที่แล้ว +13

    9:56 one small addition: you actually *can* hop from Solaris directly to the ok prompt but it requires a Sun keyboard (or some means to send the equivalent key code) *and* it will immediately halt Solaris until you resume. A little more background: Sun Keyboards have a bunch of additional keys not present on your average PC-AT 106 keys keyboard. One of them is a key labelled "Stop" - and if you press it in conjunction with the "a" key (so as "+") simultaneously, the machine will immediately fall into the ok prompt and Solaris will be halted - literally in between instructions to the CPU. But. *Everything* is preserved just the way it is. Each CPU register, the cache, memory contents - it does not harm a Solaris machine to press + other than it will immediately stop responding to anything - no network, no disk i/o, nothing. Solaris will literally be frozen on the spot and if you resume it'll continue just like nothing happened. Don't ask me what the clock will do, I'd have to look that up in the old kernel manuals, but generally it is quite safe, at least for a machine under not too heavy load. You can even force the machine to core dump from the ok prompt so you get an exact image of the Solaris kernel as it was in the moment you pressed + - extremely useful e.g. for forensics, if you suspect an attacker is doing malicious stuff right now you can stop him in his tracks and see what he was up to - of course he'll notice the machine became unreachable but that could also be the case if the machine suddenly rebooted for another reason... but I digress. I love your enthusiasm it reminds of the joy I had when I learned all the cool stuff you could do with these machines. I've been working with Sun machines starting 2000 right through 2017. Now I'm starting to look for my very own Sparc-based server all for myself (my wife's gonna *kill* me! :D)

    • @clabretro
      @clabretro  ปีที่แล้ว +5

      I knew about Stop+A but haven't had a chance to truly try it out (no Sun keyboards, unfortunately). I should have made a note about that in the video. I didn't quite realize it really halted everything like that and preserved the CPU registers, really fascinating. Always something more to learn about these Sun machines.
      And yes my wife is getting quite impressed with the server collection as well :D

    • @Andrath
      @Andrath ปีที่แล้ว

      You can plug in an unmodified vt100 terminal, that used to trigger stop-a on Sun machines. They probably fixed that with newer hardware.

    • @maxinehayes90
      @maxinehayes90 11 หลายเดือนก่อน

      Essentially being able to pause machine state on live hardware is such an insanely cool thing that I wish Linux could do!

    • @TheStefanskoglund1
      @TheStefanskoglund1 2 หลายเดือนก่อน

      @@Andrath stop-A is simulated by simply sending a break from the terminal if the machine is running, unplug terminal, wait a little and plug it back in.

  • @puffinrock2871
    @puffinrock2871 ปีที่แล้ว +3

    This is timely for me, I'm struggling with booting and bypassing the password protected LOMlite for a sunfire v100. Great stuff, learned a lot that might actually help me boot this thing.

    • @clabretro
      @clabretro  ปีที่แล้ว +1

      Hopefully you can get in! I'm not actually sure if the LOMlite has the same timeout behavior as the ALOM (the version the V240 was running), but hopefully you can get to an ok prompt - that'd be goal number one. From the ok prompt you have a lot of ways to get to Solaris (and then alter the LOM password)

  • @AnonyDave
    @AnonyDave ปีที่แล้ว +4

    You got hit with the exact same issue I had with a v245 (basically the slightly newer version of the v240) I picked up cheap about a year ago. Someone had made a mistake when they repasted the cpu, and managed to bend one pin on one cpu. Totally caused it to not want to power on. Eventually pulled the CPUs to check, found the bent pin. Straightened said pin enough to be able to get it into the socket, and came up just fine.
    Thankfully the 245 is sas rather than scsi, so the disks are a lot less rare these days

    • @clabretro
      @clabretro  ปีที่แล้ว +2

      Nice! I didn't realize the V245s were SAS... now I'm tempted 😂

  • @markpriceful
    @markpriceful ปีที่แล้ว +2

    That is awesome, glad that you stuck with it and brought the v240 back to life! Tricky issue and a great example of where you just need to step back and talk it through with someone else.

    • @clabretro
      @clabretro  ปีที่แล้ว

      Very true! Appreciated your comments and ideas as well!

  • @alexdhall
    @alexdhall ปีที่แล้ว +2

    3:24, Awwwwwwwwwwww kitteh helper! 😻

  • @McCavity2
    @McCavity2 ปีที่แล้ว +9

    16:40 another small remark: what you refer to as "stripes" is actually a "slice" in Solaris lingo. Slices in turn are pretty much the same as partitions used on hard disks anywhere else between Windows and other Unices with one notable exception: slice 2 (s2) will refer to the whole disk by convention. That's why you never change s2's parameters with fdisk or anything else. So in Linux, for example, you usually target the whole first disk in the system as /dev/sda while in solaris you'll have to use /dev/rdsk/c0t0d0s2. In Solaris, slices can be numbered 0-8 (0-9 for intel based Solaris) with the convention that s0 usually holds the os (mounted at /) and s1 is normally used as swap space (IIRC the convention was to use half or the same amount of space as RAM was built into the machine). 2 for "all the disk" we already mentioned and the rest of the numbers can be arbitrarily used. By convention you'll often find s3 holding /export (for NFS servers), s5 = /opt, s6 = /usr and s7 = /home. By the way /dev/rdsk is another quirk of Solaris because there is also a /dev/dsk for the very same device(s). The difference is how you access the device: /dev/rdsk is character based (or "raw" access, hence the "r") so it is ideal for operations like dd. /dev/dsk OTOH is block based so it's accessed through a file system. You'll notice that in /etc/vfstab the mount devices will be /dev/dsk while for low level operations you'll use /dev/rdsk.

    • @clabretro
      @clabretro  ปีที่แล้ว +2

      Whoops, I misspoke. Thanks for the note - this is probably the clearest explanation I've seen of it all!

  • @matthewsmetalworkshop
    @matthewsmetalworkshop ปีที่แล้ว +3

    Wow, dunno why youtube suddenly recommended your videos. I worked on the v240 (enchilada server or enxs) dev team back in the early noughties. A few things: it's not really runlevels... You can drop to the openboot prompt without shutting down solaris by sending a 'console break' or pressing stop-A if you have a monitor and a proper sun keyboard. At the OK prompt it's power-off (an extra dash). A single CPU config works fine, but it has to be in the right slot due to jbus termination requirements. Running without a heatsink will kill a jalapeno CPU in seconds, just like any modern big CPU.

    • @clabretro
      @clabretro  ปีที่แล้ว

      thanks for watching! yeah I shouldn't have said run level. unfortunately don't have a Sun keyboard either, but if I remember correctly during some trouble shooting a couple months ago I was able to send a break with minicom.
      I didn't know that about the CPU slot requirement, guess I just got lucky. What kind of work did you do on the enchilada team? amazing to hear that!

    • @matthewsmetalworkshop
      @matthewsmetalworkshop ปีที่แล้ว +1

      @@clabretro I worked on firmware, so openboot and POST. There is a way to generate a break through alom, but I don't remember (maybe ~.). It's been over 20 years, I've forgotten more than I remember....

    • @clabretro
      @clabretro  ปีที่แล้ว

      Very cool. honored to have you view this one!

    • @eehawkee
      @eehawkee 5 หลายเดือนก่อน

      That's fascinating! Do you have any low level information left over from when you worked on Enchilada?

  • @AureliusR
    @AureliusR ปีที่แล้ว +2

    Tip: If you send SIGUSR1 to dd, it will output its current status. You can do this by getting the PID of dd, then doing "kill -10 pid" to send SIGUSR1. On a SPARC machine, apparently, the SIGUSR1 is actually 30, not 10, but that's according to the man page on my Linux machine, so who knows. You may be able to check on your system with "man 7 signal".

  • @MrCodyswanson
    @MrCodyswanson 6 หลายเดือนก่อน

    I administered hundreds of V and E series sun machines back in the day. I've been lucky (or unlucky) enough to also have significant seat time administering IRIX, AIX, HPUX and DGUX over the years. Sun's hardware and Solaris were by far the best in class when it came to Unix. It went downhill in the later years when they were forced to compete with X86 and ultimately died at the hands of Oracle but Sun's peak was really a golden era of computing. I certainly miss it a little.

  • @cowsgomee
    @cowsgomee ปีที่แล้ว +2

    Great video again! Love that you pick the retro hardware specially sun. Got some retro gear here as well waiting, talking about sgi indigo 2 and indy machines. But they also need some tlc :)

    • @clabretro
      @clabretro  ปีที่แล้ว

      Verrrry nice! Those SGI machines look awesome, and super historically significant. Will hopefully get my hands on some eventually, hope yours get up and running!

    • @cowsgomee
      @cowsgomee ปีที่แล้ว +1

      @@clabretro I have some spares, we can talk shop if you like :)

    • @clabretro
      @clabretro  ปีที่แล้ว

      Oh definitely. You can hit me up at the email in the about section of the channel.

  • @SergioEduP
    @SergioEduP ปีที่แล้ว +2

    man I am beginning to think that my v250 cpus are both dead, I have tried all different combinations and no ok prompt.... when I originally got the machine it booted into the ok prompt trough the gpu on the screen, I just didn't have the time to play around with it, but now not even trough management tty, even though I can see ALOM.....

    • @clabretro
      @clabretro  ปีที่แล้ว +1

      Exactly the failure mode I had. If the CPUs aren't too much on eBay I think it's worth a shot grabbing one and seeing if it makes a difference (or trying each of your existing CPUs independently if you haven't already).

    • @SergioEduP
      @SergioEduP ปีที่แล้ว +1

      ​@clabretro I was looking online for some replacements and found a couple of cpus with the same markings as the ones I have installed but those are lower clocked than the sticker on the bottom of my machine states (and the jumper inside) could the ones that I have have been replaced but the jumper not changed and thus frying them? I have not been able to find much information on what the markings mean.

    • @SergioEduP
      @SergioEduP ปีที่แล้ว +2

      Also a bit more testing reveals that this is the same failure mode as if there were no cpus installed at all, seems weird that they have all these hardware checks and tests just to boot up the alom but don't check if the cpus are there at all

    • @clabretro
      @clabretro  ปีที่แล้ว +1

      Hmm seems odd that a jumper config could fry the CPU, but I'm not familiar with the v250s so I could be wrong. Same behavior without the CPUs in basically tells us what we need to know, CPUs are fried :(

  • @TheStefanskoglund1
    @TheStefanskoglund1 2 หลายเดือนก่อน

    6:47 '#' prompt is a special case of the '%' shell prompt, so stop-A is also ok, BUT you can't go from '% ' via stop-a to ' # '....
    wonder why ....

  • @kungfujesus06
    @kungfujesus06 ปีที่แล้ว +2

    Oh you can get back to the OK prompt, heh. You just need to send the equivalent of stop-a on a sun keyboard (assuming you haven't explicitly disabled it, anyway). That allows you do all kinds of stuff (including poke at memory).

  • @austinramsay
    @austinramsay หลายเดือนก่อน

    Have you done the video with installing the SunPCI card? Interested in checking that out!

  • @justine1816
    @justine1816 ปีที่แล้ว +2

    Oooh! Instant thumbs-up. Keep ‘em coming.

    • @clabretro
      @clabretro  ปีที่แล้ว

      Thanks! Will do!

  • @GenoppteFliese
    @GenoppteFliese 11 หลายเดือนก่อน +1

    I do not know the SUN stuff, but HP has it's own "ALOM" called "ILO".
    I think in 1999/2000 new servers arrived with a dedicated black box ( similar to "jetdirect" boxes) that had a ethernet port and a web server running, so you could connect to your ILO and serial console with any browser. A few years later that device was integrated on the mainboard, so there was a dedicated ethernet port allowing you to connect to ILO all the time to power cycle the server or get to the serial console. Not sure how well that stuff aged, e.g. if you need a java plugin to make old ILO versions work ...

    • @clabretro
      @clabretro  10 หลายเดือนก่อน

      yeah I've messed around with the compaq pci-x-based card and the ILO on a gen 3!

  • @extrameatsammich
    @extrameatsammich ปีที่แล้ว +1

    What state are you in? I have a pair of V440s that could use a new home. They were both fully loaded, 4x cpu card with 4gb of ram on each card and all drive bays populated. Someone pilfered the CPUs from one of them so it should be considered a spares machine.

    • @clabretro
      @clabretro  ปีที่แล้ว

      I'm in Colorado. If you're close maybe we can work something out, those V440s are not practical to ship haha.

    • @extrameatsammich
      @extrameatsammich ปีที่แล้ว +1

      @@clabretro I think that they are 80 or 100 pounds so not reall a joy to move around. I sent you an e-mail.

  • @20EsOfficial
    @20EsOfficial ปีที่แล้ว +1

    i wanna see an SSD in this thing. great content!

  • @basic-bear
    @basic-bear 10 หลายเดือนก่อน +1

    new favorite channel 😃

    • @clabretro
      @clabretro  10 หลายเดือนก่อน

      thanks!

  • @ronalerquinigoagurto555
    @ronalerquinigoagurto555 7 หลายเดือนก่อน

    You dont have that interchange prompt with bios

  • @javajav3004
    @javajav3004 8 หลายเดือนก่อน

    I love this youtube channel

  • @nathan9510
    @nathan9510 ปีที่แล้ว +1

    That poor v120 is struggling with Solaris 10! It might be happier with Solaris 9
    Also if you don't know, you can do "setenv diag-switch? false" to speed up the boot. The full POST test is really long on these.

    • @clabretro
      @clabretro  ปีที่แล้ว

      Ahh I've been meaning to set diag-switch, thanks for the reminder. And yeah... it's time to get a more appropriate Solaris version on this old thing!

  • @cyberjack
    @cyberjack 2 หลายเดือนก่อน

    nearly all servers have something similar ILO HP, IDRAC DELL , TSM Lenovo ..etc thankfully it got easy to use over years ol

  • @brianjay692
    @brianjay692 6 หลายเดือนก่อน

    Sunfight at the OK Prompt Corral…